如何使用TypeScript从Microsoft Word生成GitHub markdown文件

by Manish Bansal

通过Manish Bansal

What? Why would one want to generate an MD file from a Microsoft word document? If that’s the first thought you had after reading this title, then let me give you a strong use case.

什么? 为什么要从Microsoft Word文档生成MD文件? 如果这是您阅读本标题后的第一个想法,那么让我给您一个强大的用例。

Consider a situation where you are using Git or any other version control system (VCS) for your project’s sources as well as its artifacts. Now, like most projects, you decide to use Microsoft word for documentation and check it into Git. Again, multiple team members edit the same document. After editing, they check-in the document into the repository.

考虑一种情况,其中您将Git或任何其他版本控制系统(VCS)用于项目的源代码及其工件。 现在,像大多数项目一样,您决定使用Microsoft Word作为文档,并将其检入Git。 同样,多个团队成员编辑同一文档。 编辑后,他们将文档检入到存储库中。

Now, Git will be able to maintain the history of your document. How will you be able to look at the changes that have been made to the document since you last checked it in? Yes, you can use Microsoft word’s track change mode, but isn’t that messy? Or for heaven’s sake, will you be able to use Git diff utility to check the differences quickly? I would say, no.

现在,Git将能够维护您的文档历史记录。 自上次签入后,您将如何查看对文档所做的更改? 是的,您可以使用Microsoft Word的音轨更改模式,但这不是很乱吗? 还是为了天堂,您是否可以使用Git diff实用程序快速检查差异? 我会说,不。

Then what is the solution? Should you stop using Microsoft Word for documentation? Or should you switch to some other VCS?

那该怎么办呢? 您是否应该停止使用Microsoft Word作为文档? 还是应该切换到其他VCS?

I would say neither. How about you maintain your documentation in Microsoft word? Then change it into a markdown (MD) file (in layman terms, a text file) during the build phase and check in? If that solution excites you, then keep reading.

我不会说。 您如何用Microsoft Word维护文档? 然后在构建阶段将其更改为markdown(MD)文件(以通俗易懂的术语为文本文件)并签入? 如果该解决方案使您兴奋,请继续阅读。

But before jumping right into conversion, let me first tell you what exactly is a markdown file.

但是,在开始进行转换之前,让我先告诉您Markdown文件到底是什么。

什么是降价促销或MD文件? (What is a markdown or an MD file?)

Markdown is a syntax language aiming for easy reading and writing structured text. Further, it is easy to learn, and it only requires a text editor to create a document.

Markdown是一种语法语言,旨在轻松阅读和编写结构化文本。 此外,它很容易学习,并且只需要文本编辑器即可创建文档。

Now, there are multiple implementations of the language (like GFM aka Github flavored Markdown). Each of these implementations has their own improvements and features that are not necessarily compatible with each other.

现在,该语言有多种实现方式(例如GFM或 Github风格的Markdown)。 这些实现中的每一个都有自己的改进和功能,不一定彼此兼容。

Each implementation supports various common features like paragraphs, blockquotes, headings, and lists. This helps in maintaining text in a structured manner like Microsoft Word. But, instead of using internal binary codes, MD files use plain text characters for these features. This makes an MD file a text file but not a binary file like a docx file.

每个实现都支持各种通用功能,例如段落,块引用,标题和列表。 这有助于以结构化方式(如Microsoft Word)维护文本。 但是,MD文件不使用内部二进制代码,而是使用纯文本字符来实现这些功能。 这使MD文件成为文本文件,而不是像docx文件这样的二进制文件。

For example, in GitHub’s markdown flavor, here are the various features and ways of representing them in the form of text compared to a word document.

例如,以GitHub的markdown风格,这里是与word文档相比以文本形式表示它们的各种功能和方式。

For the detailed advantages of MD files over word documents, you can also refer to this article.

对于MD文件在Word文档的详细优点,你也可以参考这个文章。

好! 我确信。 给我看代码。 (OK! I am convinced. Show me the code.)

Disclaimer: This project is inspired by TypeScript source code. While browsing it, I found this idea of converting a word document to an MD file. You can see its source code here.

免责声明:该项目的灵感来自TypeScript源代码。 浏览它时,我发现了将Word文档转换为MD文件的想法。 您可以在此处查看其源代码。

For simplicity, I have removed a few sections of code in my repository. The original code was meant to convert TypeScript specification documentation to an MD file. This file contains lots of customized styles. So, once you are done with this article, you can very much go through TypeScript converter code and appreciate it’s abilities to perform more complex conversions.

为简单起见,我在存储库中删除了几部分代码。 原始代码旨在将TypeScript规范文档转换为MD文件。 此文件包含许多自定义样式。 因此,在完成本文的工作之后,您几乎可以遍历TypeScript转换器代码,并欣赏它执行更复杂的转换的能力。

The complete code mentioned in this article can be referred to here. The whole code can be divided into 3 sections:

本文提到的完整代码可以在这里参考 。 整个代码可以分为3部分:

  1. Gulp Configurations.

    Gulp配置。
  2. CScript execution.

    CScript执行。
  3. TypeScript main function

    TypeScript主要功能

As stated earlier, you can convert a word document to a MD file during the build phase. This can be done by any task runner. Here, I have chosen gulp.

如前所述,您可以在构建阶段将Word文档转换为MD文件。 任何任务赛跑者都可以做到这一点。 在这里,我选择了大口吃。

In Gulp configurations, I have defined 3 tasks. First one is to clean the build directory which is pretty standard. Second is to compile the TypeScript code. And the last one is to call the CScript for executing the JavaScript.

在Gulp配置中,我定义了3个任务。 第一个是清理非常标准的构建目录。 二是编译TypeScript代码。 最后一个是调用CScript来执行JavaScript。

什么是CScript? (What is CScript?)

CScript.exe (present in C:\Windows\System32) is a console-based executable for the scripting host that are used to run the scripts. It can interpret scripting languages like VB Script or JavaScript. Similarly, we have WScript but it is used for windows applications. In this, the console is not attached. So if you have a requirement of creating a console based application, we can use CScript.

CScript.exe(在C:\ Windows \ System32中存在)是用于脚本宿主的基于控制台的可执行文件,用于运行脚本。 它可以解释脚本语言,例如VB脚本或JavaScript。 同样,我们有WScript,但它用于Windows应用程序。 在这种情况下,未连接控制台。 因此,如果您需要创建基于控制台的应用程序,则可以使用CScript。

Now, in our project, the main job of CScript is to provide a run-time environment to our script i.e. JavaScript. Now, you must be thinking, why haven’t I used node instead of CScript to run my JavaScript.

现在,在我们的项目中,CScript的主要工作是为脚本(即JavaScript)提供运行时环境。 现在,您必须在思考,为什么我没有使用node而不是CScript来运行我JavaScript。

Both provide a run-time environment for a JavaScript. CScript provides inherent support for windows component object model technique. So if you try to run this script via Node, you will get an error like this.

两者都为JavaScript提供了运行时环境。 CScript为Windows组件对象模型技术提供了固有的支持。 因此,如果您尝试通过Node运行此脚本,则会收到这样的错误。

var fileStream = new ActiveXObject(“ADODB.Stream”);
var fileStream = new ActiveXObject(“ ADODB.Stream”);
ReferenceError: ActiveXObject is not defined
ReferenceError:未定义ActiveXObject

Now, what is a component object model technique?

现在,什么是组件对象模型技术?

Component object model is a technology developed by Microsoft. It is not a language but a binary standard. As per the definition,

组件对象模型是Microsoft开发的一种技术。 它不是语言,而是二进制标准。 根据定义,

The Microsoft Component Object Model (COM) is a platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft’s OLE (compound documents), ActiveX (Internet-enabled components), as well as others.

Microsoft组件对象模型( COM )是一个独立于平台,分布式,面向对象的系统,用于创建可以交互的二进制软件组件。 COM是Microsoft的OLE(复合文档),ActiveX(支持Internet的组件)以及其他产品的基础技术。

In layman terms, COM objects are interfaces to the various runtime objects. (That’s why the definition has a term called “binary software components”). It is not a language, but a technique which is programming language agnostic.

用外行术语来说,COM对象是各种运行时对象的接口。 (这就是为什么该定义有一个术语称为“二进制软件组件”的原因)。 它不是语言,而是一种与语言无关的技术。

The only language requirement for COM is that code is generated in a language that can create structures of pointers. Either explicitly or implicitly, call functions through pointers. Object-oriented languages such as C++ and Smalltalk provide programming mechanisms that simplify the implementation of COM objects

COM的唯一语言要求是以一种可以创建指针结构的语言生成代码。 通过指针来显式或隐式地调用函数。 诸如C ++和Smalltalk之类的面向对象语言提供了简化COM对象的实现的编程机制。

After that, we can use any other language like Java, VB or JavaScript to interact with those COM objects. This will further give us access to runtime applications. In our case, to Microsoft word applications.

之后,我们可以使用任何其他语言(如Java,VB或JavaScript)与这些COM对象进行交互。 这将进一步使我们能够访问运行时应用程序。 在我们的案例中,要使用Microsoft Word应用程序。

So, are you saying we cannot use Node at all here?

那么,您是在说我们根本不能使用Node吗?

No, that is not true. We can use Node also instead of CScript. But to support COM, we will need to install another package called win32com for COM support. Details can be found here.

不,那不是真的。 我们也可以使用Node代替CScript。 但是要支持COM,我们将需要安装另一个名为win32com的软件包来获得COM支持。 详细信息可以在这里找到。

最终代码 (Final code)

Now, in order to interact with word application, various APIs have been used. And since we are using the COM object model, I referred to the word object model.

现在,为了与单词应用程序进行交互,已使用了各种API。 由于我们使用的是COM对象模型,因此我将其称为对象模型 。

Word provides hundreds of objects with which you can interact. These objects are organized in a hierarchy that closely follows the user interface. At the top of the hierarchy is the Application object. This object represents the current instance of Word. The Application object contains the Document, Selection, Bookmark, and Range objects. Each of these objects has many methods and properties that you can access to manipulate and interact with the object.

Word提供了数百个可以与之交互的对象。 这些对象以紧密跟随用户界面的层次结构进行组织。 层次结构的顶部是Application对象。 该对象表示Word的当前实例。 Application对象包含Document,Selection,Bookmark和Range对象。 这些对象中的每一个都有许多方法和属性,您可以访问这些方法和属性来操作和与对象交互。

Now, in our script, we have first created a word application object by using ActiveXObject. Once the application object is obtained, the document object is created by passing the name of the document (obtained from command line arguments of cscript calling).

现在,在脚本中,我们首先使用ActiveXObject创建了word应用程序对象。 一旦获得了应用程序对象,就通过传递文档名称(从cscript调用的命令行参数获得)来创建文档对象。

Now, this represents the active object of the actual document. This object is capable of parsing as well as manipulating the word document. However, in our use case, we only need to parse the document and write a text file.

现在,这表示实际文档的活动对象。 该对象能够解析和处理word文档。 但是,在我们的用例中,我们只需要解析文档并编写一个文本文件。

This code is very generic, which is used to convert very basic features of a word document like cross-references, lists, subscript texts, bold and italic characters etc. into GFM format. However, you can write your own code converting your customized styles of the word document into the desired format.

该代码非常通用,用于将Word文档的非常基本的功能(例如交叉引用,列表,下标文本,粗体和斜体字符等)转换为GFM格式。 但是,您可以编写自己的代码,将您的Word文档的自定义样式转换为所需的格式。

You can find the actual typescript code here. The code is quite easy to read. Below are few major highlights of it:

您可以在此处找到实际的打字稿代码。 该代码很容易阅读。 以下是它的一些主要亮点:

  1. First, a document object is passed to convertDocumentToMarkdown function which returns the text to be written in an MD file.

    首先 ,将文档对象传递给convertDocumentToMarkdown函数,该函数返回要写入MD文件中的文本。

  2. Further, in convertDocumentToMarkdown, methods and properties of the document object are called to find and replace relevant word features with the corresponding GFM language syntax. E.g. first, subscript and bold & italic texts are searched. After that, the text is replaced by GFM specific code. And finally, the word styles are removed. All this is done here.

    此外,在convertDocumentToMarkdown中,将调用文档对象的方法和属性,以使用相应的GFM语言语法查找和替换相关的单词特征。 例如,首先搜索下标,粗体和斜体文本。 之后,该文本将替换为GFM特定代码。 最后,单词样式被删除。 所有这些都在这里完成。

  3. After this, cross-references are replaced. However, this is tricky. First, the toggleShowCodes function is called. This has a similar impact as alt+F9 in a word document. This replaces all the cross-references in a document with the code. After that, find and replace method is called to find and replace all cross-references with GFM style. Here, “19 REF” is passed as an argument to a function. This is a standard search criterion for finding all cross-references in a word document. At last, after replacing, again the toggleShowCodes function is called to bring back the document to its original form.

    此后,将替换交叉引用。 但是,这很棘手。 首先,调用toggleShowCodes函数。 这与Word文档中的alt + F9具有类似的影响。 这将用代码替换文档中的所有交叉引用。 此后,将调用find and replace方法,以使用GFM样式查找和替换所有交叉引用。 在此,“ 19 REF”作为参数传递给函数。 这是用于查找Word文档中所有交叉引用的标准搜索条件。 最后,在替换之后,再次调用toggleShowCodes函数将文档恢复为原始格式。

  4. At last, the writeDocument function is called which does the main job. It reads the document paragraph by paragraph and then, using switch case, looks for the styles of the paragraphs (like if it’s a heading or a table or a list paragraph or an image). Now, depending on the found style, the desired text is written in the MD file.

    最后,调用writeDocument函数完成主要工作。 它逐段读取文档,然后使用切换大小写查找段落的样式(例如,如果它是标题,表格,列表段落或图像)。 现在,根据找到的样式,将所需的文本写入MD文件中。

A word or two on embedding images: Embedding images into an MD file is a bit tricky.

关于嵌入图像的一两个单词:将图像嵌入MD文件有点棘手。

First, you need to store the images on your git repository. Then the link has to be given in the MD file for embedding in it. The syntax is ![alt text](path/in/the/repository/image1.jpg).

首先,您需要将图像存储在git存储库中。 然后,必须在MD文件中提供链接以嵌入该链接。 语法为![替代文字](path / in / the / repository / image1.jpg)。

Now, in order to auto-generate this link for an image while converting word into an MD file, hidden text is created (just after the image without any space) with content as the link itself. And then in the code, this hidden text is stripped off and inserted into the MD file.

现在,为了在将word转换为MD文件时自动为图像生成此链接,将创建隐藏文本(紧随图像之后没有任何空格),其内容本身就是链接。 然后在代码中 ,将这些隐藏的文本剥离并插入到MD文件中。

Now, you might find the actual code to do all this stuff very tedious, but this is all as per the API exposed by the Word application. So do not worry about that. You can definitely refer my code or TypeScript’s original code. Both will be a good starter for your next project.

现在,您可能会发现执行所有这些操作的实际代码非常繁琐,但这都是根据Word应用程序公开的API进行的。 因此,不必为此担心。 您绝对可以引用我的代码或TypeScript的原始代码。 两者都是您下一个项目的良好入门。

Oh wait!! That is it. You made it till the end ?. Well, then ? Congratulations! ? And, If you liked this article, please hit that clap ? button below. It would mean a lot to me and it will help other people see the story. Cheers! ?

等一下!! 这就对了。 你做到了吗? 好吧 ? 恭喜你! ? 而且,如果您喜欢这篇文章,请打一下? 下方的按钮。 这对我来说意义重大,它将帮助其他人了解这个故事。 干杯! ?

翻译自: https://www.freecodecamp.org/news/how-to-generate-a-github-markdown-file-from-microsoft-word-using-typescript-a8976ea958c3/

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/393203.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Android Studio 导入 Android 系统模块并编译和调试

FAQ: AS导入系统模块源码,并且能够编译调试,正常查看java doc ???? Android AOSP基础(五)Android Studio调试系统源码的三种方式http://liuwangshu.cn/framework/aosp/5-debug-aosp.html Android AOSP基础(四&…

2014年ENVI/IDL遥感应用与开发培训班-11月重庆站 開始报名了

主办单位: 中国遥感应用协会 Esri中国信息技术有限公司 内容简单介绍: 依据中国遥感应用协会栾恩杰理事长推动国内遥感技术和应用的指示精神,2014年中国遥感应用协会组织培训交流部与Esri中国信息技术有限公司将共同在多个城市举办以"传…

Python自动化运维:Django之View视图和Template

views详解 http请求中产生两个核心对象: http请求:HttpRequest对象 http响应:HttpResponse对象 (1) HttpRequest对象 当请求一个页面时,Django 创建一个 HttpRequest对象包含原数据的请求。然后 Django 加载…

leetcode491. 递增子序列(回溯算法)

给定一个整型数组, 你的任务是找到所有该数组的递增子序列&#xff0c;递增子序列的长度至少是2。 示例: 输入: [4, 6, 7, 7] 输出: [[4, 6], [4, 7], [4, 6, 7], [4, 6, 7, 7], [6, 7], [6, 7, 7], [7,7], [4,7,7]] 代码 class Solution {List<List<Integer>>…

java重入锁,再探JAVA重入锁

之前的文章中简单的为大家介绍了重入锁JAVA并发之多线程基础(2)。这里面也是简单的为大家介绍了重入锁的几种性质&#xff0c;这里我们就去探索下里面是如何实现的。我们知道在使用的时候&#xff0c;必须锁先有定义&#xff0c;然后我们再拿着当前的锁进行加锁操作&#xff0c…

azure服务器_如何使用Azure Functions和SendGrid构建无服务器报表服务器

azure服务器It’s 2018 and I just wrote a title that contains the words “Serverless server”. Life has no meaning.那是2018年&#xff0c;我刚刚写了一个标题&#xff0c;其中包含“无服务器服务器”一词。 生活没有意义。 Despite that utterly contradictory headli…

【GoWeb开发实战】Cookie

cookie Web开发中一个很重要的议题就是如何做好用户的整个浏览过程的控制&#xff0c;因为HTTP协议是无状态的&#xff0c;所以用户的每一次请求都是无状态的&#xff0c;我们不知道在整个Web操作过程中哪些连接与该用户有关&#xff0c;我们应该如何来解决这个问题呢&#xff…

PhotoKit 照片库的管理-获取图像

PHAsset部分属性解析 1、HDR 和全景照片 mediaSubtypes 属性验证资源库中的图像在捕捉时是否开启了 HDR&#xff0c;拍摄时是否使用了相机应用的全景模式。 2、收藏和隐藏资源 要验证一个资源是否被用户标记为收藏或被隐藏&#xff0c;只要检查 PHAsset 实例的 favorite 和 hid…

cmail服务器安装后无法登录的解决办法

安装cmailserver 5.4.6软件安装、注册都非常顺利&#xff0c;webmail页面也都正常打开&#xff0c;但是一点“登录”就提示错误&#xff1a; Microsoft VBScript 运行时错误 错误 800a01ad ActiveX 部件不能创建对象: CMailCOM.POP3.1 /mail/login.asp&#xff0c;行 42 点“…

matlab对人工智能,MATLAB与人工智能深度学习和机器学习.PDF

MATLAB与人工智能深度学习和机器学习MATLAB 与人工智能&#xff1a;深度学习有多远&#xff1f;© 2017 The MathWorks, Inc.1机器学习8机器学习无处不在▪ 图像识别 [TBD]▪ 语音识别▪ 股票预测▪ 医疗诊断▪ 数据分析▪ 机器人▪ 更多……9什么是机器学习&#xff1f;机…

leetcode1471. 数组中的 k 个最强值(排序)

给你一个整数数组 arr 和一个整数 k 。 设 m 为数组的中位数&#xff0c;只要满足下述两个前提之一&#xff0c;就可以判定 arr[i] 的值比 arr[j] 的值更强&#xff1a; |arr[i] - m| > |arr[j] - m| |arr[i] - m| |arr[j] - m|&#xff0c;且 arr[i] > arr[j] 请返回…

Spring中WebApplicationInitializer的理解

现在JavaConfig配置方式在逐步取代xml配置方式。而WebApplicationInitializer可以看做是Web.xml的替代&#xff0c;它是一个接口。通过实现WebApplicationInitializer&#xff0c;在其中可以添加servlet&#xff0c;listener等&#xff0c;在加载Web项目的时候会加载这个接口实…

使用fetch封装请求_关于如何使用Fetch API执行HTTP请求的实用ES6指南

使用fetch封装请求In this guide, I’ll show you how to use the Fetch API (ES6) to perform HTTP requests to an REST API with some practical examples you’ll most likely encounter.在本指南中&#xff0c;我将向您展示如何使用Fetch API(ES6 )来执行对REST API的 HTT…

hadoop集群中客户端修改、删除文件失败

这是因为hadoop集群在启动时自动进入安全模式 查看安全模式状态&#xff1a;hadoop fs –safemode get 进入安全模式状态&#xff1a;hadoop fs –safemode enter 退出安全模式状态&#xff1a;hadoop fs –safemode leave转载于:https://www.cnblogs.com/lishengnan/p/a123.ht…

OpenStack nova-network 支持多vlan技术实现片段代码

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748

Rest API

什么是接口测试 接口测试又称 API 测试 Application Programming Interface 接口测试是测试系统组件间接口的一种测试。重点关注数据传递。 接口测试一般会用于多系统间交互开发&#xff0c;或者拥有多个子系统的应用系统开发的测试。 为什么要做接口测试 很多系统关联都是基于…

php循环checkbox,php循环删除checkbox | 学步园

一、首先要了解sql语句$SQLdelete from user where id in (1,2,4);表单大概是&#xff1a;form action methodpost input nameID_Dele[] typecheckbox idID_Dele[] value1input nameID_Dele[] typecheckbox idID_Dele[] value2input nameID_Dele[] type首先要了解sql语句$SQL&q…

leetcode1451. 重新排列句子中的单词(排序)

「句子」是一个用空格分隔单词的字符串。给你一个满足下述格式的句子 text : 句子的首字母大写 text 中的每个单词都用单个空格分隔。 请你重新排列 text 中的单词&#xff0c;使所有单词按其长度的升序排列。如果两个单词的长度相同&#xff0c;则保留其在原句子中的相对顺序…

Java+Oracle实现事务——JDBC事务

J2EE支持JDBC事务、JTA事务和容器事务事务&#xff0c;这里说一下怎样实现JDBC事务。 JDBC事务是由Connection对象所控制的&#xff0c;它提供了两种事务模式&#xff1a;自己主动提交和手动提交&#xff0c;默认是自己主动提交。 自己主动提交就是&#xff1a;在JDBC中。在一个…

开源项目贡献者_我如何从一名贡献者转变为一个开源项目维护者

开源项目贡献者by Dhanraj Acharya通过Dhanraj Acharya 我如何从一名贡献者转变为一个开源项目维护者 (How I went from being a contributor to an Open Source project maintainer) I was a lone software developer. When I was in college, I attended the KDE conference…