[Table of Contents] [Search]


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question: OCR software and long 'esses'



Renae,

You might take a look at abbyusa.com -- their FineReader product
-- at least as competition for Omnipage. FineReader is expensive,
at $300, but the latest OmniPage Pro 14 Office appears to cost
$600 retail now, ouf... used to come packaged with hardware, "for
free"... There is a downloads tab at www.abbyusa.com which will
give you a 15-day free trial run of FineReader.

I have not used FineReader myself, but there are lots of
favorable reviews about it online. For example,

	http://products.consumerguide.com/cp/office/review/index.cfm/id/25438

I have used Omnipage in the past but never have been very happy
with its French, let alone with its "swash 's'" capabilities.

There are European-made OCR packages which address the general
character set problem, and may offer you the text sensitivity or
at least flexibility which you seek: for example --

	* Readiris
	http://www.irislink.be/ps/fr/products/corporate/features/

	"Le meilleur logiciel OCR multilingue"

 	"Reconnaît jusqu'à 108 langues et offre 7 langues
	d'interface utilisateur (anglais, français, allemand,
	espagnol, italien, néerlandais, portugais-brésilien).

	"Langues reconnues : français, afrikaans, albanais,
	anglais américain, anglais britannique, anglais pidgin,
	aymara, balinais, basque, bemba, bikol, bislama..."

-- always easier to blame someone, when something doesn't work,
if they specifically advertised that it would...

One feature of FineReader which seems well-designed for your
particular "swash 's'" concern, and looks applicable to other
eccentrically - rarebook problems as well, is their "Pattern
Training":

	Tools=>Options=>Recognition=>Pattern Training

-- they just explained to me that it builds a dot-matrix "box"
around any particular character which you wish to single out and
identify yourself -- for example, the occurences of "swash 's'"
in a given text. They allow you to regulate the size of the box
and hence the text included -- thus allowing for ligatures as
well, I suppose -- and the software then follows your own
instruction as to what to call the thing.

Again I have not used this myself but it does sound very useful:
one of those "about time" and "why haven't they offered this
before" custom-tailoring features. I do not know whether Omnipage
offers anything similar -- perhaps their "Speed Zoning"? -- they
do let you develop your own "libraries" of terms, I believe,
which can be helpful but is not really the same thing.


Unfortunately paper qualities, of both your original and your
output, and font types / sizes / qualities, and printer
capacities, and binding and "exposure to light damage" problems,
all probably will have at least as much to do with your OCR
success on old texts as the software does. But at least
FineReader and the European "language" packages may give you some
alternatives, to weigh against $$$ industryleader Omnipage.


Jack, kessler@well.com

ps. Omnipage "plugs in" well to networks and other applications.
You might consider this if you have that need. They have been
around for a long time and I believe are the largest, now:

	http://www.scansoft.com/omnipage/


On Tue, 18 May 2004, Renae Satterley wrote:

> Has anyone had occasion to use an OCR software that
> will reliably recognize long 'esses' rather than
> treating them as 'f's'?  I have searched the ExLibris
> archives, but did not see any thread that mentions
> this discussion.
>
> Thank-you,
> Renae Satterley


[Subject index] [Index for current month] [Table of Contents] [Search]

 [CoOL]