Convert LaTeX to docx using Pandoc
Challenges and solutions
Format
Prepare a reference .docx file and apply the format using the --reference-doc command. This can be done by creating a without reference first; tuning the format; then reproducing the .docx file.
Citation references
See relevant sections in Pandoc User’s Guide. Reference styles are available to download from Zotero Style Repository.
--citeproc--bibliography=xxx.bib--csl=xxx.csl
Cross references
Pandoc does convert the section reference to section numbers, but it does not correctly convert references for tables and figures.
Inspired by the xr package, when the hyperref package .aux file, the definition starts with \newlabel. The regex expression for parsing the \newlabel lines are:
^\\newlabel{(.+)}{{(.+)}{(.*)}{(.*)}{(.*)}{(.*)}}$
Here is an example filter in Python:
#!/usr/bin/env python
import re
from pandocfilters import toJSONFilter
from functools import lru_cache
REF_REGEX = re.compile(r'^\\newlabel{(.+)}{{(.+)}{(.*)}{(.*)}{(.*)}{(.*)}}$')
@lru_cache
def load_aux(fname: str):
refs = {}
with open(fname) as fp:
for line in fp:
res = REF_REGEX.search(line)
if res:
refs[res.group(1)] = res.group(2)
return refs
def resolveRef(key, value, format, meta):
refs = load_aux('main.aux')
if key == "Link":
try:
res = re.search(r'^\[(.*)\]$', value[1][0]['c'])
if res:
value[1][0]['c'] = refs[res.group(1)]
except Exception as e:
pass
if __name__ == "__main__":
toJSONFilter(resolveRef)
Undefined math command
Some math commands, such as \tiny, \large, \text (from amsmath), are not supported by Pandoc. One can also use filters to strip them away or replace them with alternatives if necessary.
Here is an example filter in Python:
#!/usr/bin/env python
from pandocfilters import toJSONFilter
def replaceCommands(key, value, format, meta):
if key == "Math":
for i in range(len(value)):
if isinstance(value[i], str):
value[i] = value[i].replace(r"\tiny", "")
value[i] = value[i].replace(r"\large", "")
value[i] = value[i].replace(r"\text", "\mathrm")
if __name__ == "__main__":
toJSONFilter(replaceCommands)
Final command
To put them together, the final command would be:
pandoc main.tex \
--citeproc \
--bibliography=xxx.bib \
--csl=physical-review-b.csl \
--reference-doc=main-ref.docx \
--filter fix-texmath.py \
--filter resolve-ref.py \
-t docx -o main-$(date +%Y%m%d%H%M).docx