I'm not a pandoc user (so far); and have struggled many times in the past with bugs and lacking features in LibreOffice and LaTeX regarding right-to-left text layout and language-specific issues.
My question: How "trustworthy" is pandoc in handling right-to-left content and side-stepping the minefield of target format issues involving such content? Is this subject getting explicit attention from maintainers?
Core contributors are westerners or Russian (US, UK, Switzerland, Germany, Russia), and we rely heavily on user reports to improve non-LTR scripts and languages. But the goal is to make pandoc work flawlessly for everyone.
/If/ you accept that premise, why do you think Pandoc has been so very successful where perhaps other applications written in haskell have not? The Problem domain (something about writing parsers)? The contributors? The culture? Something else entirely?
Of course if you reject that premise I'd also be interested to hear your thoughts on it in as much detail as you care to provide.
Cheers.
But there still may be some truth to the claim. A simple fact is that smaller mind share -> fewer programs -> less chance for extremely successful projects. From personal experience: it took me three tries and multiple months to get comfortable enough with Haskell to the point that I was able to write my first contribution to pandoc (the org-mode parser), despite having dabbled in functional-style Lisp for years before that. But Haskell, as used by pandoc, isn't difficult. In fact, I often find it easier to use Haskell, thanks to its excellent type system. It's just very different and requires a bit more investment up front, with huge benefits lurking down the road.
Data to support my claim that Haskell is actually easy to use: over 300 people have contributed to pandoc, with over 100 contributing Haskell code. Many of those contributors have never written any Haskell before, but the type system helped them to find their way.
I talked a bit about the whole topic here: https://youtu.be/JpNEIpLtCHs
Yes, repeatedly, and I'd love to know why you think it matters and what it is indicative of!
Also transforming documents seems like a task well suited to functional languages.
Some command (or commands) that can be wrapped in a script:
> convert2txtViaOCR.sh -i input.pdf -o output.txt
Thanks.
Under US law at least, open source software is commercial: https://dwheeler.com/essays/commercial-floss.html
I'm using Pandoc to write my PhD thesis at the moment, from Markdown source, using certain filters to "augment" what Markdown can do. Examples:
https://github.com/LaurentRDC/pandoc-plot
https://github.com/lierdakil/pandoc-crossref
More info here: https://pandoc.org/filters.html
I wrote a filter that automatically converts URL citstions in markdown to "real" citations in any style you want - very useful for writing papers without fighting with bibtex and managing bibliographies manually: https://github.com/phiresky/pandoc-url2cite
I've also toyed with using it to process code blocks, as a dead-simple literate programming tool.
You can write in markdown and then convert it to word for your uni.
I don't know how they did it, but somehow they put dependency hell on a completely new level.
Yes i'm sure it's a great tool, but there's a limit how much bloat I can tolerate for a single program.
$ zypper info --requires pandoc
libm.so.6()(64bit)
libpthread.so.0()(64bit)
libm.so.6(GLIBC_2.2.5)(64bit)
libpthread.so.0(GLIBC_2.2.5)(64bit)
libm.so.6(GLIBC_2.29)(64bit)
libdl.so.2()(64bit)
libdl.so.2(GLIBC_2.2.5)(64bit)
libz.so.1()(64bit)
libc.so.6(GLIBC_2.17)(64bit)
ld-linux-x86-64.so.2()(64bit)
ld-linux-x86-64.so.2(GLIBC_2.3)(64bit)
libgmp.so.10()(64bit)
libpthread.so.0(GLIBC_2.3.2)(64bit)
libm.so.6(GLIBC_2.27)(64bit)
librt.so.1()(64bit)
libutil.so.1()(64bit)
libpthread.so.0(GLIBC_2.12)(64bit)
libnuma.so.1()(64bit)
libnuma.so.1(libnuma_1.1)(64bit)
libnuma.so.1(libnuma_1.2)(64bit)
libffi.so.8()(64bit)
libffi.so.8(LIBFFI_BASE_8.0)(64bit)
libffi.so.8(LIBFFI_CLOSURE_8.0)(64bit)
$ rpm -ql pandoc | grep -v '^/usr/share'
/usr/bin/pandoc
$ ll -h /usr/bin/pandoc
-rwxr-xr-x 1 root root 162M Sep 30 13:33 /usr/bin/pandocWith pandoc and all the haskell dependencies, the only downside is the length of the list of packages when you upgrade. If it was all bundled up as haskell-all I doubt I'd even notice.
I have heard of others, like git-annex, but not used them myself. I wonder if there are any I just didn't know were.
I also wonder if anything about Haskell makes it particularly suited as the implementation language for Pandoc. It must have a lot of parsers in it, and Haskell is supposed to be good for coding parsers.
There are parser generation libraries and meta-libraries for certain other languages, notably C++. I wonder what Pandoc in C++ would look like. Probably a pretty good parser meta-library could be spun out of such a project.
I have installed the latest texlive in home directory.
When I invoke 'sudo apt install pandoc' it requires me to install a massive texlive setup at the system level as part of it.
This is not specific to pandoc but many other packages. I have anaconda3 installed in my home, but image-magick requires a massive numpy/scipy system-level install (ignoring for the moment my bewilderment at why would image-magick require numpy/scipy).
I refuse to put up with this kind of bloated bs.
Am I allowed to distribute GPL programs contained inside a Docker image for on-premise installations? Do I just need to provide proper credit and a link to the source code?
Or is there a commercial license available for Pandoc? (I couldn't find anything.)
UPDATE: I've decided to evaluate pandoc and see if it might be useful for supporting Markdown and Word formats, etc. If it is, then I'll reach out to John McFarlane and ask about a commercial license (or just something in writing), perhaps in exchange for sponsorship on GitHub.
Also what in GPL makes this difficult to use it commercial software? You are even free to sell it after all.
Also using AGPL doesent require to use commercial license, where does that come from?!
Better to just use a GPL compatible distribution method: pandoc has 349 contributors; none of them signed a copyright assignment, so you'd need permission from each and every contributor to use the software in a way not permitted by the GPL.
If you need a freelancer with deep pandoc knowledge, please do reach out. I'm happy to help.
I would bet many people who use Pandoc have no idea they rely on it. I don't think Jupyter or RStudio make a big fuss about it even though they both use it.
I always ponder whether it’s the most practically useful Haskell tool ever written.
- Babelmark, a tool to compare how different Markdown parsers interpret the same Markdown input. https://johnmacfarlane.net/babelmark2/
- CommonMark, the first formalized Markdown standard, and now the de-facto Markdown standard. https://commonmark.org/ (He's the first listed member of the team.)
I feel like John is probably the single largest contributor to what Markdown is today, other than perhaps the creator of Markdown. Thank you for your work!
The creator of Markdown hasn't touched it in over a decade and yet decided to throw a temper tantrum because CommonMark dared to initially call itself Standard Markdown.
I'm not sure of the specifics but personally I prefer formats that don't evolve over time. So not changing a spec for over a decade should not be considered pathological but actually commendable, if the nature of spec is complete enough for it's purpose.
I know vanilla Markdown is too limited for some use cases. But that is no reason to "overwrite" it.
[1] https://talk.commonmark.org/t/the-logo-and-name-should-proba...
Sure, Gruber didn't allow CommonMark to use the Markdown name, but I feel like that's not a super big deal compared to what he did do. The Markdown ecosystem wouldn't exist if Markdown hadn't been created in the first place! I'm not confident someone would have made something like Markdown if Markdown was never created: AsciiDoc and reStructuredText came out before Markdown but have not been as successful.
Gruber's original Markdown spec lacked formality -- and that's where CommonMark eventually filled the gaps -- but I think that Markdown's focus on user experience over technicality was the key to its success over competing formats and WYSIWYG editors (the real competition). By the time CommonMark came around, Markdown had already seen viral adoption; three of CommonMark's creators are from large companies that were already prominently using Markdown.
tl;dr I think the original Markdown spec and CommonMark are both significant contributions in their own right!
I keep a list of all my skills, experience and education in a YAML file and have a LaTeX template that I clone when creating a new resume. Then it’s just a matter of replacing the template fields with YAML metadata and running Pandoc.
Another example where Caliber compliments Pandoc well is when generating ebooks for sideloading onto kindles. Pandoc can create epubs which Calibre can in turn convert to mobi.
- Style using XSL-FO: Use Pandoc to DocBook, XSLT docbook-xsl stylesheets to convert to XSL-FO, Apache FOP to convert XSL-FO to PDF.
I've used pandoc for pdf generation and ffmpeg for some audio recording/encoding/playback. I can't imagine what I would use imagemagick by itself for though (that I wouldn't use some common image processing application for). What do you use imagemagick to do?
Automate various transformations:
- resize - change orientation or ratio - adjust colors - convert format - do all of the above to generate thumbnails of large photos, in one command
FYI, https://orgmode.org/list/87y2jvkeql.fsf@gnu.org is about enhancing Org's syntax documentation. If you have specific needs/ideas that you'd like to share, please don't hesitate.
https://gist.github.com/imarko/ec8f39550662fcd16908b7ec9d100...
Can be changed to use .txt or .md if preferred.
I most often use http://markup.rocks/ for converting HTML to Markdown and for testing that my reStructuredText syntax is correct when contributing to docs.
Pandoc also has a demo web page for trying it out (https://pandoc.org/try/). The demo supports all of Pandoc’s formats and doesn’t require a large JS download, but it silently truncates inputs to 3,000 characters.
Let me know if there's anything you'd like to see that would make it more useful for you!
EDIT: I've also used this workflow for reading RFCs for OAuth and such. It's just basically a small curl piped to say away. Sometimes if I feel like reading an article I'll add a readability like cli tool piped between the curl and say commands. Unix is awesome!
Been using it with https://github.com/Wandmalfarbe/pandoc-latex-template to generate my documents.
Please comment if there are other nice templates, either for LaTeX or for Doc
However it's not quite done, yet. I'm mostly interested in PDF output, and not having LaTeX was one of the goals, so I use weasyprint for PDF generation. Too bad they are very slow with releases, and I encountered many bugs...
I was on a modeling project that used scripts to generate hundreds of input parameters, embed them in models, run the models, and produce reports. The inputs and outputs shifted a lot over the course of the project, as we came to understand the domain and implications of the work better. At every update, the changes had to be transferred to a Microsoft Word document that went to the project sponsors.
Pandoc made this easy -- we just added scripts to write out the model inputs as Markdown tables, then embed those tables in a larger writeup, also written in Markdown. Pandoc turned it all into a Word document. Thus, the same toolchain that did the actual work, also drove the final report. I really don't think we could have had confidence all the tabular data was right, had it not been automated through Pandoc.
Here is an article where I show how to use Panflute, a library that lets you write filters in Python, and how I wrote a set of filters to automate the tedious parts of writing a complex technical manual:
It's also fantastic for converting my class notes from Markdown with LaTeX equations into beautiful PDFs.
I've been using Pandoc (and make) daily for over 6 years for all sorts of document writing (letter, report, thesis, design doc, performance review, you name it) and solve the occasional "interesting" format conversion problem. Its robust, reliable, fast, and a pleasure to use (and script).
a large thread from 2018: https://news.ycombinator.com/item?id=17855104
That aside, I find the markdown + additional features (e.g. latex math, inline code eval), mainly as implemented in Rstudio and Rmarkdown, to be the sweet spot of power and convenience of typing and legibility in plain text form. Thanks pandoc!
Flawless!
Pandoc filters allowed me to transform the AST in useful ways. For example I turned the image tag into HTML figures with captions, used the video tag if the URL was a video, and called ffmpeg to encode the video in another format for browsers that didn't support the other format.
+1 for being written in Haskell, indeed way back when I became interested in Haskell, I think it was noticing that this tool I was using was written in a strange programming language that influenced me to eventually adopted it many side projects and to write a little book on.