IntroductionIn this article I will show, how to convert a Microsoft word document t o HTML. Dot Net Framework has a great Microsoft Word object library to implement within 2.0 or later version of the Framework. ImplementationAs usual create windows application and add two buttons, a textbox and an OpenFileDialog box. The OpenFileDialog help to select the file from the directory. The final designed form like below, A next step is very important in order to convert MS Word document to HTML using Microsoft Word Object Library in C#. For that you have to add Reference into your project. 1. Right click on your project and select Add Reference 2. And then Navigate “COM” Tab and select Microsoft Word 12.0 Object Library, shown as below Note: If you are using Windows Vista or Windows you may found Version 13 or 14. As you have added Object Reference Library, you will able to see Reference list as like below, Next write code for select source file, double click browse button in form and write code as like below, C# Code private void btnOpen_Click(object sender, EventArgs e)
{
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
txtPath.Text = openFileDialog1.FileName;
}
}Now write code to convert selected word document to HTML. Just double click Convert button and write as like below, C# Code private void btnConvert_Click(object sender, EventArgs e)
{
object missingType = Type.Missing;
object readOnly = true;
object isVisible = false;
object destionationDocFormat = 8;//wdFormatHTML OR 8 is HTML format.
object sourceFile = txtPath.Text;
object htmlfilePath = Path.Combine(System.IO.Path.GetDirectoryName(txtPath.Text), (Path.GetFileNameWithoutExtension(txtPath.Text) + ".htm"));
//Open the word document in background
ApplicationClass appclass = new ApplicationClass();
appclass.Documents.Open(ref sourceFile,
ref readOnly,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType, ref isVisible,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType);
appclass.Visible = false;
Document document = appclass.ActiveDocument;
try
{
document.SaveAs(ref htmlfilePath, ref destionationDocFormat, ref missingType,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType, ref missingType,
ref missingType, ref missingType, ref missingType,
ref missingType);
}
catch (Exception ex)
{
throw ex;
}
finally
{
if (document != null)
{
document.Close(ref missingType, ref missingType, ref missingType);
}
Marshal.ReleaseComObject(appclass);
appclass = null;
}
}That's all. Now run application and select a word file and click convert button you will get converted HTML file in same location. One more additional tips of this conversion, the MS word document has support to convert many files formats. I have given all formats below, public enum WdSaveFormat
{
wdFormatDocument97 = 0,
wdFormatDocument = 0,
wdFormatTemplate97 = 1,
wdFormatTemplate = 1,
wdFormatText = 2,
wdFormatTextLineBreaks = 3,
wdFormatDOSText = 4,
wdFormatDOSTextLineBreaks = 5,
wdFormatRTF = 6,
wdFormatEncodedText = 7,
wdFormatUnicodeText = 7,
wdFormatHTML = 8,
wdFormatWebArchive = 9,
wdFormatFilteredHTML = 10,
wdFormatXML = 11,
wdFormatXMLDocument = 12,
wdFormatXMLDocumentMacroEnabled = 13,
wdFormatXMLTemplate = 14,
wdFormatXMLTemplateMacroEnabled = 15,
wdFormatDocumentDefault = 16,
wdFormatPDF = 17,
wdFormatXPS = 18,
wdFormatFlatXML = 19,
wdFormatFlatXMLMacroEnabled = 20,
wdFormatFlatXMLTemplate = 21,
wdFormatFlatXMLTemplateMacroEnabled = 22,
}Download Sample Project Download source files -37 kb ConclusionThrough this article you have learned how to convert Microsoft Word Document to HTML file with Microsoft word Object Library using C#. hopes help and thank for reading. |