reading data from the .Doc file by using Apache POI api
this program simply explains how to read data from the MS wordfile(.DOC) line by line using Apache POI,
what is Apache POI and what is the need i already explain in previous post, you can find that post here
for executing this program we need to download Apache POI api and make jar files in classpath.
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package multidocument;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
/**
*
* @author vijay
*/
public class NewDocReader {
public static void main(String args[]) throws FileNotFoundException, IOException
{
File docFile=new File(“c:\\multi\\multi.doc”); // file object was created
FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); // file input stream with docFile
HWPFDocument doc=new HWPFDocument(finStream);// throws IOException and need to import org.apache.poi.hwpf.HWPFDocument;
WordExtractor wordExtract=new WordExtractor(doc); // import org.apache.poi.hwpf.extractor.WordExtractor
String [] dataArray =wordExtract.getParagraphText();
// dataArray stores the each line from the document
for(int i=0;i<dataArray.length;i++)
{
System.out.println(“\n–”+dataArray[i]);
// printing lines from the array
}
finStream.close(); //closing fileinputstream
}
}



Hi Vijay
Could u gve me ur Ph Number.
Hw it is Possible.Am trying to dat only….Hw to access the data from MS-PowerPoint.
Rajesh
May 15, 2009 at 4:37 pm
hi vijay am using the poi interface here but i am getting an error ?? if this program is not properly ?
then please give the full details
what this package multidocument;
hope you reply me soon
thanks
rajdeo
July 12, 2010 at 4:27 pm
hello rajesh,
ru able to execute the above code successful !.
if u want to access the ms-powerpoint so,
HWSL is used to make the Microsoft PowerPoint 97(-2003) file format by pure Java. It supports read and write capabilities of some, but not yet all of the core records.
follow the example and observe the imports.
this code is for creating a PowerPoint presentation. For this we are creating the object of SlideShow after that we are creating an object of slide and an object of file output to create a .ppt file .
import org.apache.poi.hslf.HSLFSlideShow;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.usermodel.SlideShow;
import java.io.*;
public class createNewPersentation
{
public static void main(String str[])
{
try{
SlideShow slideShow = new SlideShow();
Slide slide = slideShow.createSlide();
FileOutputStream out = new FileOutputStream(“slideshow.ppt”);
slideShow.write(out);
out.close();
}catch(Exception e){}}
}
katta vijay
May 15, 2009 at 4:54 pm
Hi Vijay
Actually using this APIS i got the text from doc file but i didnt get the original format like if any word or sentence is in bold ot italic of tabular format so i didnt get this details along with text.like in RTF file this information contain in its header form.So is there any provision to get it.
Vinay
June 22, 2009 at 5:38 pm
hi vijay am using the poi interface here but i am getting an error ?? if this program is not properly ?
then please give the full details
what this package multidocument;
hope you reply me soon
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
thanks
rajdeo
July 12, 2010 at 5:01 pm
hi there, is there a way to show ppt slide in the java applet?
veron
June 23, 2009 at 2:47 pm
Hi Vijay!
Can you give an example how to read a doc file picture. Is there a difference if the document is inserted in Microsoft Word BMP, JPEG, GIF files, or other.
Waiting for reply. Thank you.
Dima
July 15, 2009 at 10:58 pm
Can you also post a example program to write into Word and Excel documents using POI. Thank You!!
krish
October 8, 2009 at 2:08 am
Sir Vijay,
i have already downloaded Apache POI api in C:\. i do not know how to make jar files in classpath. then, where do i have to save it?can you please do me a favor?
thank you and happy coding.
rednahs
October 28, 2009 at 12:33 pm
please where can i download the packages import org.apache.poi.hwpf.HWPFDocument; and import org.apache.poi.hwpf.extractor.WordExtractor;
emi
February 28, 2010 at 12:31 pm
please where can i download the packages import org.apache.poi.hwpf.HWPFDocument; and import org.apache.poi.hwpf.extractor.WordExtractor;
dd
emi
February 28, 2010 at 12:32 pm
Hi to all,
I am reading an Word document file using poi HWPF. But it is not displaying the all types of image formats like BMP,WMV…..
Please suggest any process to follow..
Thanks
Jetti
jetti
May 1, 2010 at 3:19 pm
hi i want to re the word the word file using poi interface but i am getting the error
i am including the following package
but error is
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
any body help me
thanks
rajdeo
July 12, 2010 at 5:05 pm
hi i want to re the word the word file using poi interface but i am getting the error
i am including the following package
but error is
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
HWPFDocument cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
An error occurred at line: 10 in the jsp file: /abcpoi.jsp
Generated servlet error:
WordExtractor cannot be resolved to a type
any body help me
thanks
rajdeo
July 12, 2010 at 5:06 pm
Like wise I want to read the .rtf files.
How can I do that…
Ramesh
October 7, 2010 at 1:11 pm