Fedora Linux Support Community & Resources Center
Old 31st August 2012, 06:40 PM
coolwanglu Offline
Registered User
Join Date: Aug 2012
Location: Hong Kong
Posts: 1
Introducing pdf2htmlEX: converts PDF to HTML without losing format

[0916 Update]
Added 2 more demo pages:

* Completed removed Boost
* Relaxed dependency of C++11, supports GCC no earlier than 4.4.6
* Links are now supported (In-document jumping is accurate to pages)
* Fixed an encoding problem for some fonts.
Demo comes first:
(Sorry I cannot create links as it kept getting rejected by the forum)

Another (with CJK):

Home page:

There are bascially 2 types of pdf-to-html converters:
One is roughly a pdf-to-text converter with a few pre-defined formats in HTML.
The other is render-everything-as-images converter, which loses all text and generated huge files.

But pdf2htmlEX takes advatanges of both, retaining both Text and Styling.
1.Extract and embed fonts from PDF
2.Optimizing for web while making sure render is precise
3.Non-text objects are rendered as images.
4.Single-file output mode -- I know you hate spearated font/image files

To compile & install
grab a recent poppler (>=0.20.3), make sure '--enable-xpdf-headers' is used for configure
grab the latest git version of fontforge https://github.com/fontforge/fontforge, because I submitted a few features/bugs for pdf2htmlEX
the boost c++ library. (See detailed depended components in the project home page)
GCC that supports c++11

Personall I'm using Ubuntu, and I've set up a Ubuntu PPA https://launchpad.net/~coolwanglu/+archive/pdf2htmlex

If any of you enjoy this tool and would like to pack it for Fedora, please contact me. Many Thanks!

Any suggestion, fork/star-at-gihub, bug-report is appreciated.

Last edited by coolwanglu; 16th September 2012 at 03:49 PM.
Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
cron mail in HTML format. got lucky? jay041111 Using Fedora 1 11th April 2011 11:53 AM
pdf to html format satimis Using Fedora 3 8th November 2009 02:37 PM
Convert a plain Text into HTML format Nkunzis Programming & Packaging 5 14th December 2006 12:00 PM

Current GMT-time: 21:34 (Thursday, 17-08-2017)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat