[0916 Update]
Added 2 more demo pages:
coolwanglu.github.com/pdf2htmlEX/demo/cheat.html
coolwanglu.github.com/pdf2htmlEX/demo/geneve.html
* Completed removed Boost
* Relaxed dependency of C++11, supports GCC no earlier than 4.4.6
* Links are now supported (In-document jumping is accurate to pages)
* Fixed an encoding problem for some fonts.
Demo comes first:
coolwanglu.github.com/pdf2htmlEX/demo/demo.html
(Sorry I cannot create links as it kept getting rejected by the forum)
Another (with CJK):
coolwanglu.github.com/pdf2htmlEX/demo/chn.html
Home page:
github.com/coolwanglu/pdf2htmlEX
There are bascially 2 types of pdf-to-html converters:
One is roughly a pdf-to-text converter with a few pre-defined formats in HTML.
The other is render-everything-as-images converter, which loses all text and generated huge files.
But pdf2htmlEX takes advatanges of both, retaining both Text and Styling.
Features:
1.Extract and embed fonts from PDF
2.Optimizing for web while making sure render is precise
3.Non-text objects are rendered as images.
4.Single-file output mode -- I know you hate spearated font/image files
To compile & install
grab a recent poppler (>=0.20.3), make sure '--enable-xpdf-headers' is used for configure
grab the latest git version of fontforge
https://github.com/fontforge/fontforge, because I submitted a few features/bugs for pdf2htmlEX
the boost c++ library. (See detailed depended components in the project home page)
cmake
GCC that supports c++11
Personall I'm using Ubuntu, and I've set up a Ubuntu PPA
https://launchpad.net/~coolwanglu/+archive/pdf2htmlex
If any of you enjoy this tool and would like to pack it for Fedora, please contact me. Many Thanks!
Any suggestion, fork/star-at-gihub, bug-report is appreciated.