CentOS 7: How to build tesseract-OCR from source


Step 1: build Leptonica Library

CentOS 7 : how to build Leptonica Library

Step 2: download source

[root@tutorialspots ~]# wget https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz
--2020-08-14 07:44:32--  https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.1 [following]
--2020-08-14 07:44:32--  https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.1
Resolving codeload.github.com (codeload.github.com)... 140.82.113.10
Connecting to codeload.github.com (codeload.github.com)|140.82.113.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘4.1.1.tar.gz’

    [  <=>                                  ] 1,974,988   5.84MB/s   in 0.3s

2020-08-14 07:44:33 (5.84 MB/s) - ‘4.1.1.tar.gz’ saved [1974988]

Step 3:

tar xvf 4.1.1.tar.gz
cd tesseract-4.1.1
export LIBLEPT_HEADERSDIR=/usr/local/include/leptonica
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib 
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig

Step 4:

./autogen.sh

Result:

[root@tutorialspots tesseract-4.1.1]# ./autogen.sh
Running aclocal
Running /usr/bin/libtoolize
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
Running autoconf
Running autoheader
Running automake --add-missing --copy
configure.ac:86: installing 'config/config.guess'
configure.ac:86: installing 'config/config.sub'
configure.ac:27: installing 'config/install-sh'
configure.ac:27: installing 'config/missing'
src/api/Makefile.am: installing 'config/depcomp'
parallel-tests: installing 'config/test-driver'

All done.
To build the software now, do something like:

$ ./configure [--enable-debug] [...other options]

Step 5:

./configure --prefix= --with-extra-libraries=/usr/local/lib

Result:

...
config.status: creating java/Makefile
config.status: creating java/com/Makefile
config.status: creating java/com/google/Makefile
config.status: creating java/com/google/scrollview/Makefile
config.status: creating java/com/google/scrollview/events/Makefile
config.status: creating java/com/google/scrollview/ui/Makefile
config.status: creating doc/Makefile
config.status: creating config_auto.h
config.status: executing depfiles commands
config.status: executing libtool commands

Configuration is done.
You can now build and install tesseract by running:

$ make
$ sudo make install
$ sudo ldconfig

Documentation will not be built because asciidoc or xsltproc is missing.

You can not build training tools because of missing dependency.
Check configure output for details.

Step 6:

make -j install

Result:

...
make[3]: Leaving directory `/root/tesseract-4.1.1/tessdata/tessconfigs'
make[2]: Leaving directory `/root/tesseract-4.1.1/tessdata/tessconfigs'
make[2]: Entering directory `/root/tesseract-4.1.1/tessdata'
make[3]: Entering directory `/root/tesseract-4.1.1/tessdata'
make[3]: Nothing to be done for `install-exec-am'.
 /usr/bin/mkdir -p '/share/tessdata'
 /usr/bin/install -c -m 644 pdf.ttf '/share/tessdata'
make[3]: Leaving directory `/root/tesseract-4.1.1/tessdata'
make[2]: Leaving directory `/root/tesseract-4.1.1/tessdata'
make[1]: Leaving directory `/root/tesseract-4.1.1/tessdata'
Making install in doc
make[1]: Entering directory `/root/tesseract-4.1.1/doc'
make[2]: Entering directory `/root/tesseract-4.1.1/doc'
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Leaving directory `/root/tesseract-4.1.1/doc'
make[1]: Leaving directory `/root/tesseract-4.1.1/doc'
Making install in unittest
make[1]: Entering directory `/root/tesseract-4.1.1/unittest'
make[2]: Entering directory `/root/tesseract-4.1.1/unittest'
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.
make[2]: Leaving directory `/root/tesseract-4.1.1/unittest'
make[1]: Leaving directory `/root/tesseract-4.1.1/unittest'

Leave a Reply