How to install Fulltextsearch in Nextcloud with Elasticsearch and Tesseract OCR

Today I want to show you how to install Fulltextsearch with OCR in Nextcloud 13 or 14 with Elasticsearch and Tesseract OCR. This guide also works with Plesk Onyx.

Prerequisites:

– an Ubuntu VPS with running Nextcloud-Instance
– root-access

the following Apps must be installed and activated in Nextcloud:

– Full text search – Bookmarks (BETA)
– Full text search – Elasticsearch Platform (BETA)
– Full text search – Files – Tesseract OCR (BETA)
– Full text search – Files (BETA)
– Full text search (BETA)

Step 1: Install Elasticsearch

We have to install Java:

sudo apt-get install openjdk-8-jre

and we add the elasticsearch repository:

sudo apt install apt-transport-https
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
sudo apt update

now we can install elasticsearch:

sudo apt install elasticsearch

for security reasons, we bind elasticsearch to 127.0.0.1;

sudo nano /etc/elasticsearch/elasticsearch.yml

add or adjust the following line:

network.host: 127.0.0.1

and enable the service:

sudo systemctl daemon-reload && sudo systemctl enable elasticsearch && sudo systemctl start elasticsearch

check if elasticsearch is running:

curl -XGET '127.0.0.1:9200/?pretty'

and install the „ingest-attachment“-Plugin (required for PDF, PPT, XLS etc.)

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment

now restart elasticsearch and we are ready with this step:

sudo systemctl restart elasticsearch

 

Step 2: Install Tesseract OCR

we can install Tesseract OCR with the following command:

sudo apt install tesseract-ocr

now we have to install additional languages (in this example English, German and French):

sudo apt install tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra

if you want install all languages, the following command is with you:

sudo apt install tesseract-ocr-all

 

Step 3: Configure Nextcloud

Under „Administration -> Full text search“ we adjust the following settings:

Step 4: Create index

to create the index we have to run the following command in our nextcloud-directory:

sudo -u www-data php ./occ fulltextsearch:index

the first run will take some time, but if finished you can fulltextsearching…

That’s it, have fun and share!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert