ScroogeXHTML 3.1 API documentation: Overview
Quick Start
This overview document includes general informations about the
ScroogeXHTML JavaBean(tm).
See:
Description
Quick Start
This overview document includes general informations about the
ScroogeXHTML JavaBean(tm). The description of all JavaBean(tm)
properties
can be found in the documentation for ScroogeXHTMLBase.
The
conversion methods
are defined in the class ScroogeXHTML and
documented in the package summary document de.betabeans.scroogexhtml.
Installation
To install ScroogeXHTML in your software development environment
or IDE, please follow the instructions in your software manual
to install the JavaBean(tm) class ScroogeXHTML. A JavaBean(tm) can also
be instantiated and used without IDE, just by creating a new
instance of the bean class.
Requirements
ScroogeXHTML requires JRE 1.3 or higher.
ScroogeXHTML for Delphi(tm)
A component version for
Borland
(tm)
Delphi
(tm) is also available from
BetaSoft
.
About XHTML
XHTML is compatible with HTML and also with XML. Because of its
XML compatibility, XHTML can be integrated in next-generation
internet systems and works in browsers for devices like TV set
top boxes and mobile phones. More information about XHTML can be
found at
http://www.w3c.org
.
The XHTML code generated by ScroogeXHTML can be displayed by
many browsers and has passed the online
W3C validation service
. Please note that ScroogeXHTML allows to insert custom HTML
code, which will not be checked automatically for XHTML
compliance.
About RTF
The RTF specification is available from
Microsoft
(tm). Some web sites, for example
Wotsit
, have the RTF specification online.
About UTF-8
http://en.wikipedia.org/wiki/UTF-8
Wikipedia entry about UTF-8.
Additional informations for HTML browser support
Checklist for HTML character coding
this page presents a number of character repertoire scenarios
and makes recommendations for optimising accessibility on older
browser/versions -
German version
.
References
HTML 4.01 Specification
, W3C Recommendation, Dave Raggett, Arnaud Le Hors, Ian Jacobs,
24 December 1999.
See: http://www.w3.org/TR/1999/REC-html401-19991224
XHTML 1.0: The Extensible HyperText Markup Language
, W3C Recommendation, Steven Pemberton, et al., 26 January 2000.
See: http://www.w3.org/TR/2000/REC-xhtml1-20000126
Extensible Markup Language (XML) 1.0 (Second Edition)
, W3C Recommendation, Tim Bray, Jean Paoli, C. M.
Sperberg-McQueen, Eve Maler, 6 October 2000.
See: http://www.w3.org/TR/2000/REC-xml-20001006
Cascading Style Sheets, level 1
, W3C Recommendation 17 Dec 1996, revised 11 Jan 1999
See: http://www.w3.org/TR/REC-CSS1
Cascading Style Sheets, level 2
, CSS2 Specification, W3C Recommendation,
12-May-1998
See: http://www.w3.org/TR/REC-CSS2/
Known problems, unsupported RTF elements
-
embedded pictures, computed fields, tables and tabulator
positons can not be converted with ScroogeXHTML
- font names which contain a ';' are not supported
-
a tabulator character will be replaced by a sequence of
eight non breaking space characters
-
only alphabetic characters in the "Symbol" font will be
converted (to Greek characters)
-
Wingdings characters in Web pages is a bad idea, because
the font does not use Unicode encoding and is not
available on all computers. The intended Wingdings
characters may not appear on computers running
non-Microsoft operating systems such as Mac OS 9, Mac OS
X 10 or Linux. The intended characters are also unlikely
to appear on computers that are running Windows when
using a standards-compliant browser such as Mozilla,
Netscape 7 or Opera 6 or 7. The same problems are found
with the Webdings, Wingdings 2 and Wingdings 3 fonts -
they should not be used in Web pages.
ScroogeXHTML Revision history
2008-06-29: Revision 3.1
-
Use no escape XHTML ' entity for single quote
-
Improved ANSI character handling
2008-04-05: Revision 3.0
2007-07-10: Revision 2.9
-
Improved support for parameter values in the range
-2^63..2^63 -1.
2006-03-11: Revision 2.8
Conversion speed has been improved, this version is about
60-80% faster.
Bugs fixed in 2.8
-
Fixed a bug in
ScroogeXHTMLBase#getDefaultFontStyleDefinition()
Changes in 2.8
- Added simple PlainText conversion
-
Added property convertIndent (note: defaults to true,
should not break existing code)
- Added support for "pnlvlcont" RTF token
-
Added support for "pict" RTF token to route picture data
to DG_IGNORE
-
Added support for font definitions (in \fonttbl) which
are not embedded in braces
-
Added private method isAlpha to ScroogeXHTMLMain class
(see JavaDoc for more information)
-
Added check for EOF ("empty stack") in main conversion
loop
-
Added class "Formatter" which controls line breaks and
indentation
-
Added class "TranslatorFactory" which creates the
Translator object
- Changed method CharacterProperties#DeepCopy
-
Changed ScroogeXHTMLMain#getLeadingHTMLTags() adds no
empty line if the getStyleSheetInclude() property is
empty
-
Changed design of the logging helper classes, to allow
easy migration to Log4J. For more details, see docs for
package "logging"
-
Changed source to be more compliant with Sun's Java
coding guidelines
- Changed classes to final classes whenever possible
- Changed from Vector to List and ArrayList
- Changed from Enumeration to Iterator
- Renamed method "textElementToXHTML" to "process"
-
Renamed class "XHTMLMobileProfile10Translator" to
"XHTMLMobileProfile10"
- Renamed class "ScroogeXHTMLReader" to "RTFReader"
-
Renamed class "ScroogeXHTMLUnicode" to
"UnicodeConverter"
- Renamed class "ScroogeXHTMLWriter" to "DOMWriter"
-
Renamed class "ScroogeXHTMLDocParagraph" to "Paragraph"
-
Renamed class "ScroogeXHTMLDocText" to "FormattedText"
- Renamed class "ScroogeXHTMLDocument" to "Document"
-
Moved converter classes to new package
de.betabeans.scroogexhtml.converter
2004-03-04: Revision 2.7
Major changes:
- Added support for right-to-left languages
-
Added support for XHTML Mobile Profile 1.0 document type
-
Added support for point, em, ex or percent font sizes
(property FontSizeScale)
- Added support for language attributes (lang="..")
-
Added support for Double Byte Character Set strings in
font names
Minor changes:
- Added JUnit tests which perform XML validation
- Added and improved JavaDoc documentation
-
Fixed a problem which caused a nullpointer exception if
there was no default font with font number zero
-
Fixed a problem in the XHTML Transitional translator
which generated an invalid parameter for paragraphs with
'justify' alignment
- Removed all deprecated properties and methods
- Removed unused class LogAdapter
2003-08-01: Revision 2.6
Major changes:
-
Added Double Byte Character Set support for Japanese,
Simplified Chinese, Traditional Chinese and Korean
Minor changes:
- Added JUnit tests
-
Added method
setTagStyle(String tagName, String style)
which allows to define an additional CSS style parameter
for <p>, <br /> and <li> tags
-
Added method
setTagClass(String tagName, String className)
which allows to define a CSS class for <p>, <br
/> and <li> tags. Example: setTagClass("p",
"pink_border") will change the conversion of <p>
tags to <p class="pink_border">
-
Added support for
\up0
and
\dn0
tokens (switch super/subscript off)
-
Added optimizations in
ScroogeXHTMLDocument.buildHtml()
- ScroogeXHTMLDocParagraph.java,
ScroogeXHTMLDocument.java
-
Added a fix for a bug which added font color "#000000"
to hyperlink text
-
Added a bugfix for documents which only contain empty
paragraphs
- Removed ShowMessages property
-
Removed unused constant INDENTCRLF -
ScroogeXHTMLBase.java
-
Removed method 'public void add(DocumentNode node)' from
DocumentNode interface and all implementing classes
-
Removed isEmpty method
2003-06-19: Revision 2.5
Major changes:
-
Added support for XHTML Basic 1.0, HTML 4.01 Strict and
HTML 4.01 Transitional
-
Added performance improvements for the debugger and
logger classes
-
Added BeforeTextConversionEvent which receives the text
before it is encoded and entites are replaced, and
changed the Unicode and XHTML encoding procedures to
allow 'deferred' conversion
-
Added property 'useAposTag' which switches conversion of
apostroph between ' and '. Default:
true
-
Added property 'convertEmptyParagraphs' which optionally
replaces <p></p> with <br />. Default:
false
-
Changed source code to be compliant with the Java coding
style guide
-
New properties 'convertEmptyParagraphs', 'logger',
'loggingEnabled'
- Symbol font support (greek alphabet)
Minor changes:
-
Added performance improvements for the debugger and
logger classes
-
Added optimization for internal
ScroogeXHTMLDocParagraph.isEmpty() method
-
Added LogInterface with DefaultLogger and EmptyLogger
implementations
-
Added LogAdapter interface and implementation class
ScroogeXHTMLLogger to support custom logger
implementations
-
Added property 'loggingSupported' which decides if a
DefaultLogger or the EmptyLogger will be used
-
Added Encoder interface and implementation class
ScroogeXHTMLEncoder
-
Changed debugger implementation, debug mode is now about
100% faster
-
Changed source code to be compliant with the Java coding
style guide
-
Changed Encoder interface and implementation class
ScroogeXHTMLEncoder to Translator interface,
XHTMLTranslator, XHTML10StrictTranslator, XHTML10TransitionalTranslator
- Changed debugger output, uses a style sheet
-
Changed package structure: classes and interfaces for
bean methods are now in package
de.betabeans.scroogexhtml.methods
2003-04-06: Revision 2.4
Major changes:
-
Added definitions for special entites to
ScroogeXHTMLWriter class. Note: there are two possible
declarations for the constant XHTML_SPECIAL_ENTITTY_APOS
(the apostrophe mark, U+0027 ISOnum). By default, the
XHTML standard compliant code "'" will be used
by ScroogeXHTML. To support older browsers however it is
also possible to use the second declaration, using the
code "'"
- Added support for simple numbered lists
-
Added support for documents which use different
character sets. Based on the character set which is
assigned to a font, characters will be translated to
Unicode.
- Changed default document content type to UTF-8
-
Added conversion for single quote character to
"'"
-
Added support for left, right and first line paragraph
indent
- Added support for highlight color
-
Added property 'IncludeDefaultFontStyle': it sets the
font attributes of the "BODY" tag, so all text in the
document body will have the defined default font style
-
Renamed the 'textElement' event to 'afterTextConversion'
Minor changes:
-
Changed encoding for unicode characters from hexadecimal
(&x...;) to decimal (&#...;)
-
Changed conversion for unknown character sets: do not
use Cp1252 as default
-
Added support for font background color in
OpenOffice.org RTF documents
-
Added support for the 'bullet' keyword as a workaround
for OpenOffice.org Writer bullet lists
-
Changed method "replaceHyperlink" to use target="_new"
only if XHTMLTransitional = true
-
Fixed a bug which disabled conversion of 'justified'
paragraphs
- Fixed a bug in Unicode support
-
Changed translation for ‌ / ‍ special
characters to Unicode
- Added new property 'Version'
- Removed unused ScroogeXHTMLBeanInfo class
-
Added support for "\line" token (required line break)
-
Added property "IncludeXMLDeclaration" which inserts the
line <?xml version="1.0">. Note: default = true
- Added filter for invalid form feed characters
-
Added automatic deletion of empty paragraphs at the end
of the generated document
- Changed some logging messages to have a lower level
-
Added method 'getFormattedMessage()' to LogEvent class
2002-12-13: Revision 2.3
-
Added new conversion method public String convert(String
rtf)
2002-05-11: Revision 2.2
-
Changed all event handlers to conform JavaBeans
standard. All events can be accessed in the NetBeans IDE
now.
2002-05-05: Revision 2.1
- Added GUI demo application
- Added support for \~ token (non-breaking space)
- Added property "replaceFonts"
- Added property "logLevel"
-
Added EventListener "log", using the Log4J logging
levels (DEBUG, INFO, WARN, ERROR, FATAL)
-
Changed output on System.err / System.out to log method
calls
- Changed showMessages default to false
-
Changed showMessages implementation to write on
System.out (allows to work without log event listener)
- Removed deprecated profiler class
-
Fixed a bug in the ScroogeXHTMLReader.finishFontName
method (use single quotes)
-
Fixed a bug in the ScroogeXHTMLWriter.getWriterstate
method (use deepCopy method)
-
Fixed a bug in the ScroogeXHTMLReader class
(initialization of curFontNr)
2002-03-01: Revision 2.0
Copyright
©
1998-2006 Michael Justin. All rights reserved. Java,
JavaBean, JDK, Sun, Sun Microsystems, and the Sun Logo are
trademarks or registered trademarks of Sun Microsystems,
Inc. in the U.S. and other countries. The Jump to Java Logo
is a trademark or registered trademark of Sun Microsystems,
Inc. in the U.S. and other countries. All Borland and
Borland brands and product names are trademarks or
registered trademarks of Borland. Microsoft, Windows,
Windows NT, and/or other Microsoft products referenced
herein are either registered trademarks or trademarks of
Microsoft Corporation in the United States and/or other
countries. Other brands and their products are trademarks of
their respective holders.
Copyright (c) 1998-2006 BetaSoft Michael Justin. All Rights Reserved.