This is an example of Data Mining with the help of Java.
We’re calculating the number of occurrences of a specified word in all the files in a specified folder, and then ranking them accordingly.
- We’ve used the class FileInputStream to stream the data from a file and the class StringBuffer to buffer the streaming bytes from the file and form 1 long string out of it.
- Class File for getting all files in a folder and enlisting them in an array
- Variable iCh is an Integer variable and will store the ASCII value for the current character.
- Make sure to provide the full path to whichever folder you want to access.(Eg. C:\Mining)
- Only use files and folders which you’re sure you have access to
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
//www.MyCoding.net import java.io.*; class PageRanking{ public static void main(String[] args){ String fpath, word; //For the folder path and the word to be searched for File [] rootFiles; File root; int [] count; BufferedReader br; //To get input from the user try{ br = new BufferedReader(new InputStreamReader(System.in)); //Input System.out.println("Enter the folder path: "); fpath = br.readLine(); System.out.println("Enter the word: "); word = br.readLine(); //Initialize these 3 objects root = new File(fpath); rootFiles = root.listFiles(); //List the files in the array rootFiles count = new int[rootFiles.length]; //Initialize count array equal to the number of files for (int i = 0; i <rootFiles.length; i++){ count[i] = searchInFile(rootFiles[i].toString(), word); //Get word count for all files 1-by-1 } //Sorting in descending order of count for(int i = 0; i <rootFiles.length; i++){ for(int j = i+1; j<rootFiles.length; j++){ if(count[i]<count[j]){ //swap count int temp = count[i]; count[i] = count[j]; count[j] = temp; //swap filename File ftemp = rootFiles[i]; rootFiles[i] = rootFiles[j]; rootFiles[j] = ftemp; } } } //print all the filenames with their corresponding counts for (int i = 0; i <rootFiles.length; i++){ System.out.println(rootFiles[i] + " \t " + count[i]); } }catch(Exception e){} } static int searchInFile(String fpath, String word){ FileInputStream fin; //To obtain data from file in form of a bytestream StringBuffer file; //To store data coming from file int iCh, wcount = 0; //iCh:character in form of integer(ASCII) try{ fin = new FileInputStream(fpath); file = new StringBuffer(); do{ iCh = fin.read(); if(iCh!=-1){ file.append((char)iCh); //Convert the character in int variable iCh to char format and append to variable file } }while(iCh!=-1); int flen = file.length(); int wlen = word.length(); for(int j = 0; j<= flen-wlen;j++){ if((file.substring(j, j+wlen)).equals(word)) //check for the word to be present anywhere in the file { wcount++; //If present, increment count } } fin.close(); //Close FileInputStream }catch(Exception e){ wcount = 0; e.printStackTrace(); } return wcount; } } |
- Debug using Command Prompt/Eclipse/whichever compiler as an administrator to get over Access Denied exceptions.
- Use method equalsIgnoreCase(word) instead of equals(word) to make it work case-insensitive.