Word2CHM released

介绍Word2CHM,一款开源C#程序,用于将MS Word文档转换为CHM帮助文件格式,通过三个步骤:转换Word文档为单个HTML文件,拆分HTML文件为多个文件,最后编译为CHM。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Introduce

Word2CHM snapshotWord2CHM is a open source C# program which can convert MS Word document (in 2000/2003 format) to a CHM document. Learn more , visit http://www.sinoreport.net/Word2CHM_Details.aspx .

This is a screen snapshot.

Background

Many people write customer help document with MS Word, because MS Word is very fit to write document include text, images and tables.

But many customers did not want read help document in MS Word format, but they like CHM format. So it is useful than convert ms word document to CHM document. This is why I build Word2CHM.

Word2CHM

In Word2CHM , there are three steps in converting ms word document to CHM document . First is convert ms word document to a single html file, second is split a single html file to multi html files, and thirst is compile multi html files to a single CHM file.

First, Convert ms word document to a single html file

MS Word application support OLE automatic technology, a C# program can host a ms word application, open ms word binary document and save as a html file.

 There are some sample C# code that hosts a ms word application.
private bool SaveWordToHtml(string docFileName, string htmlFileName)
{
    // check doc file name
    if (System.IO.File.Exists(docFileName) == false )
    {
        this.Alert("File '" + docFileName + "' not exist!");
        return false;
    }
    // check output directory
    string dir = System.IO.Path.GetDirectoryName(htmlFileName);
    if (System.IO.Directory.Exists(dir) == false )
    {
        this.Alert("Directory '" + dir + "' not exist!");
        return false;
    }
    object trueValue = true;
    object falseValue = false;
    object missValue = System.Reflection.Missing.Value;
    object fileNameValue = docFileName;
    // create word application instance
    Microsoft.Office.Interop.Word.Application app =
        new Microsoft.Office.Interop.Word.ApplicationClass();
    // set word application visible
    // if something is error and quit , user can close word application by self.
    app.Visible = true;
    // open document
    Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(
        ref fileNameValue,
        ref missValue,
        ref trueValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue);
    // save a html file
    object htmlFileNameValue = htmlFileName;
    object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatFilteredHTML;
    doc.SaveAs(
        ref htmlFileNameValue ,
        ref format,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue,
        ref missValue);
    // close document and release resource
    doc.Close(ref falseValue, ref missValue, ref missValue);
    app.Quit(ref falseValue, ref missValue, ref missValue);
    System.Runtime.InteropServices.Marshal.ReleaseComObject(doc);
    System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
    return true;
}

In this C# source code, it is important than call function ReleaseComObject. Use ReleaseComObject function, program can release all resource use by word application.

In many program which hosts ms word application( also Excel application ), When program does not need word application, program can call Quit function of word application. But sometimes, The word process still alive, this is lead very serious resource leak. Use ReleaseComObject can reduce this risk.

Second, Split a single html file to multi html file

The html file generate word application include all content of word document. For example, A word document contains the following content.

 

I Save this document as filtered html file, the html file source code as the following.

<html>

       <head>

              <meta http-equiv=Content-Type content="text/html; charset=gb2312">

              <meta name=Generator content="Microsoft Word 11 (filtered)">

              <title>Header1</title>

              <style>

               some style code

              </style>

       </head>

       <body lang=ZH-CN style='text-justify-trim:punctuation'>

              <div class=Section1 style='layout-grid">
                     <h1><span lang=EN-US>Header1</span></h1>
                     <p class=MsoNormal><span lang=EN-US>Content1</span></p>
                     <h2><span lang=EN-US>Header2</span></h2>
                     <p class=MsoNormal><span lang=EN-US>Content2</span></p>
              </div>
       </body>
</html>

In this html source code, a div tag include all content, Word2CHM need split this html file to two files.

File0.html

<html>
       <head>
              <meta http-equiv=Content-Type content="text/html; charset=gb2312">
              <meta name=Generator content="Microsoft Word 11 (filtered)">
              <title>Header1</title>
       <style>
        --------------
       </style>
       </head>
       <body>
              <h1>Header</h1><hr />
              <p class=MsoNormal><span lang=EN-US>Content1</span></p>
              <hr /><h1>Footer</h1>
       </body>
</html>

File1.html

<html>
       <head>
              <meta http-equiv=Content-Type content="text/html; charset=gb2312">
              <meta name=Generator content="Microsoft Word 11 (filtered)">
              <title>Header1</title>
       <style>
        --------------
       </style>
       </head>
       <body>
              <h1>Header</h1><hr />
              <p class=MsoNormal><span lang=EN-US>Content2</span></p>
              <hr /><h1>Footer</h1>
       </body>
</html>

Here , program add html souce “<h1>Header</h1><hr />” in the front of html content source code , and add “<hr /><h1>Footer</h1>” after html content. Those additional html source uses as header and footer.

In Word2CHMI use the following C# code to split html file.
string strDir = System.IO.Path.GetDirectoryName(fileName);
string strHtml = null;
System.Text.Encoding encoding = System.Text.Encoding.Default ;
using (StreamReader reader = new StreamReader(fileName, encoding, true))
{
    //set content encoding
    encoding = reader.CurrentEncoding;
    //read HTML source code
    strHtml = reader.ReadToEnd();
}
int index = strHtml.IndexOf("<body");
string strHeader = strHtml.Substring(0, index);
string strHeader1 = strHeader;
string strHeader2 = null;
index = strHeader.IndexOf("<title>");
if (index > 0)
{
    strHeader1 = strHeader.Substring(0, index);
    int indexEndTitle = strHeader.IndexOf("</title>");
    strHeader2 = strHeader.Substring(indexEndTitle + 8);
    // read title
    this.strTitle = strHeader.Substring(index + 7, indexEndTitle - index - 6 - 1);
}
else
{
    strTitle = System.IO.Path.GetFileNameWithoutExtension(fileName);
}
index = strHtml.IndexOf(">", index);
string strBody = strHtml.Substring(index + 1);
index = strBody.LastIndexOf("</body>");
strBody = strBody.Substring(0, index);
index = strBody.IndexOf("<div");
if (index >= 0)
{
    index = strBody.IndexOf(">", index+1);
    strBody = strBody.Substring(index + 1 );
    index = strBody.LastIndexOf("</div>");
    strBody = strBody.Substring(0, index);
}
//Split html document by tag <h>
index = strBody.IndexOf("<h");
if (index >= 0)
{
    strBody = strBody.Substring(index);
}
else
{
    strBody = "";
}
strBody = strBody.Trim();
int lastLevel = 1;
int lastNativeLevel = 1;
while (strBody.Length > 0)
{
    int Nativelevel = Convert.ToInt32(strBody.Substring(2, 1));
    int level = Nativelevel;
    if (lastNativeLevel == Nativelevel)
    {
        level = lastLevel;
    }
    else
    {
        if (level > lastLevel + 1)
        {
            level = lastLevel + 1;
        }
    }
    lastNativeLevel = Nativelevel;
    lastLevel = level;
    int index2 = strBody.IndexOf(">");
    int index3 = strBody.IndexOf("</h" + Nativelevel + ">");
    //read text in <h</h> as topic title
    string strTitle = strBody.Substring(index2 + 1, index3 - index2 - 1);
    while (strTitle.IndexOf("<") >= 0)
    {
        int index4 = strTitle.IndexOf("<");
        int index5 = strTitle.IndexOf(">", index4);
        strTitle = strTitle.Remove(index4, index5 - index4 + 1);
    }
    strBody = strBody.Substring(index3 + 5);
    index = strBody.IndexOf("<h");
    if (index == -1)
    {
        index = strBody.Length;
    }
    //read topic content
    string strContent = strBody.Substring(0, index);
    // add node to chm document DOM tree
    CHMNode currentNode = null;
    if (this.Nodes.Count == 0 || level == 1)
    {
        //create node
        currentNode = new CHMNode();
        this.Nodes.Add(currentNode);
    }
    else
    {
        CHMNode parentNode = this.Nodes.LastNode;
        while (true)
        {
            if (parentNode.Nodes.Count == 0)
                break;
            if (parentNode.Level == level - 1)
            {
                break;
            }
            parentNode = parentNode.Nodes.LastNode;
        }
        currentNode = new CHMNode();
        //add child node
        parentNode.Nodes.Add(currentNode);
    }
    //set node's name
    currentNode.Name = strTitle;
    strContent = strContent.Trim();
    if (strContent.Length > 0)
    {
        string strHtmlFileName = "";
        CHMNode node = currentNode;
        while (node != null)
        {
            int NodeIndex = node.Index;
            if (node.Parent == null)
                NodeIndex = this.Nodes.IndexOf(node);
            if (strHtmlFileName.Length > 0)
                strHtmlFileName = NodeIndex + "-" + strHtmlFileName;
            else
                strHtmlFileName = NodeIndex.ToString();
            node = node.Parent;
        }
        strHtmlFileName = "File" + strHtmlFileName + ".html";
        currentNode.Local = strHtmlFileName;
        myFiles.Add(strHtmlFileName);
        strHtmlFileName = System.IO.Path.Combine(strDir, strHtmlFileName);
        //Generate topic html file
        using (StreamWriter writer = new StreamWriter(strHtmlFileName, false, encoding))
        {
            if (strHeader2 != null)
            {
                //write header html source
                writer.Write(strHeader1);
                writer.Write("<title>" + strTitle + "</title>");
                writer.Write(strHeader2);
            }
            else
            {
                writer.Write(strHeader);
            }
            writer.WriteLine("<body style=' margin: 0px 0px 0px 0px; padding: 0px 0px 0px 0px;font-family: Verdana, Arial, Helvetica, sans-serif;' >");
            string header = this.HelpHeaderHtml;
            if (header != null)
            {
                //write header html source code
                header = header.Replace("@Title", strTitle);
                writer.WriteLine(header);
            }
            //write html content
            writer.WriteLine(strContent);
            //write footer html source
            writer.WriteLine(this.HelpFooterHtml);
            writer.WriteLine("</body>");
            writer.WriteLine("</html>");
        }
    }
    if (index == strBody.Length)
    {
        break;
    }
    else
    {
        strBody = strBody.Substring(index);
    }
}//while
//write html file
string strFilesDir = System.IO.Path.ChangeExtension(fileName, "files");
if (System.IO.Directory.Exists(strFilesDir))
{
    string dirName = System.IO.Path.GetFileName(strFilesDir);
    foreach (string name in System.IO.Directory.GetFiles(strFilesDir))
    {
        string name2 = System.IO.Path.GetFileName(name);
        name2 = System.IO.Path.Combine(dirName, name2);
        myFiles.Add(name2);
    }

}

Use this C# code, I split html file by use html tag H1,H2,H3 and Hn.And set each html document’s title as content between html tag Hn.

 
 

Third. Compile multi html files to a single CHM file

Word2CHM can not compile multi html file to a single CHM file by it self,  It call “HTML Help workshop” to generate CHM file.

HTML Help workshop is a product of Microsoft, It can compile multi html file to a CHM file, It save settings in a help project file which extend name is hhp.

In Word2CHM , program generate HHP file , It use the following C# source code.
strOutputText = "";
if (System.IO.File.Exists(compilerExeFileName) == false)
{
    throw new System.IO.FileNotFoundException(compilerExeFileName);
}
string strHHP = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhp");
string strHHC = System.IO.Path.Combine(this.WorkDirectory, strName + ".hhc");
string strCHM = System.IO.Path.Combine(this.WorkDirectory, strName + ".chm");
if (System.IO.File.Exists(strCHM))
{
    System.IO.File.Delete(strCHM);
}
string DefaultTopic = null;
CHMNodeList nodes = this.GetAllNodes();
foreach (CHMNode node in nodes)
{
    if (HasContent(node.Local))
    {
        DefaultTopic = node.Local;
        break;
    }
}
// Generate hhp file
using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
           strHHP,
           false,
           System.Text.Encoding.GetEncoding(936)))
{
    myWriter.WriteLine("[OPTIONS]");
    myWriter.WriteLine("Compiled file=" + System.IO.Path.GetFileName(strCHM));
    myWriter.WriteLine("Contents file=" + System.IO.Path.GetFileName(strHHC));
    myWriter.WriteLine("Default topic=" + this.DefaultTopic);
    myWriter.WriteLine("Default Window=main");
    myWriter.WriteLine("Display compile progress=yes");
    myWriter.WriteLine("Full-text search=" + (this.FullTextSearch ? "Yes" : "No"));
    myWriter.WriteLine("Binary TOC=" + (this.BinaryToc ? "Yes" : "No"));
    myWriter.WriteLine("Auto Index=" + (this.AutoIndex ? "Yes" : "No"));
    myWriter.WriteLine("Binary Index=" + (this.BinaryIndex ? "Yes" : "No"));
    //myWriter.WriteLine("Index file=" + System.IO.Path.GetFileName( strIndexFile ));
    myWriter.WriteLine("Title=" + this.Title);
    myWriter.WriteLine("[FILES]");
    foreach (CHMNode node in nodes)
    {
        if (HasContent(node.Local))
        {
            if (myFiles.Contains(node.Local) == false)
            {
                myFiles.Add(node.Local);
            }
        }
    }
    foreach (string fileName in myFiles)
    {
        myWriter.WriteLine(fileName);
    }
}
// Generate hhc file
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.AppendChild(doc.CreateElement("hhc"));
ToHHCXMLElement(this.myNodes, doc.DocumentElement);
using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter(
           strHHC,
           false,
           System.Text.Encoding.GetEncoding(936)))
{
    myWriter.Write(doc.DocumentElement.InnerXml);
}
// Compile project , generate chm file
ProcessStartInfo start = new ProcessStartInfo(compilerExeFileName, "\"" + strHHP + "\"");
start.UseShellExecute = false;
start.CreateNoWindow = true;
start.RedirectStandardOutput = true;
start.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
System.Diagnostics.Process proc = System.Diagnostics.Process.Start(start);
proc.PriorityClass = System.Diagnostics.ProcessPriorityClass.BelowNormal;
this.strOutputText = proc.StandardOutput.ReadToEnd();
// Delete template file
if (deleteTempFile)
{
    System.IO.File.Delete(strHHP);
    System.IO.File.Delete(strHHC);
}
if (System.IO.File.Exists(strCHM))
    return strCHM;
else
return null;

After generate HHP file , Word2CHM use the following C# code to generate CHM file.

string hhcPath = Word2CHM.Properties.Settings.Default.HHCExePath;
if( System.IO.File.Exists( hhcPath ) == false )
{
    MessageBox.Show("Can not find execute file '"

        + hhcPath + "' of 'HTML Help Workshop'!");
    return;
}
try
{
    string name = System.IO.Path.ChangeExtension(
        this.myDocument.FileName , "hhp");
    this.Cursor = System.Windows.Forms.Cursors.WaitCursor;
    name = myDocument.CompileProject(
        hhcPath ,
        Word2CHM.Properties.Settings.Default.DeleteTempFile );
    this.Cursor = System.Windows.Forms.Cursors.Default;
    System.Diagnostics.Debug.WriteLine( myDocument.OutputText);
    if (name == null)
        Alert( "Compile error!");
    else
        Alert( "Genereate file " + name);
}
catch (Exception ext)
{
    Alert("App error:" + ext.Message);
}

After complete this three steps , Word2CHM can convert a Word document to a CHM file.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值