applicationsnextcloudpleskVPS

How to install Fulltextsearch in Nextcloud with Elasticsearch and Tesseract OCR

Today I want to show you how to install Fulltextsearch with OCR in Nextcloud 13 or 14 with Elasticsearch and Tesseract OCR. This guide also works with Plesk Onyx.

Prerequisites:

– an Ubuntu VPS with running Nextcloud-Instance
– root-access

the following Apps must be installed and activated in Nextcloud:

– Full text search – Bookmarks (BETA)
– Full text search – Elasticsearch Platform (BETA)
– Full text search – Files – Tesseract OCR (BETA)
– Full text search – Files (BETA)
– Full text search (BETA)

Step 1: Install Elasticsearch

We have to install Java:

sudo apt-get install openjdk-8-jre

and we add the elasticsearch repository:

sudo apt install apt-transport-https
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
sudo apt update

now we can install elasticsearch:

sudo apt install elasticsearch

for security reasons, we bind elasticsearch to 127.0.0.1;

sudo nano /etc/elasticsearch/elasticsearch.yml

add or adjust the following line:

network.host: 127.0.0.1

and enable the service:

sudo systemctl daemon-reload && sudo systemctl enable elasticsearch && sudo systemctl start elasticsearch

check if elasticsearch is running:

curl -XGET '127.0.0.1:9200/?pretty'

and install the „ingest-attachment“-Plugin (required for PDF, PPT, XLS etc.)

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment

now restart elasticsearch and we are ready with this step:

sudo systemctl restart elasticsearch

 

Step 2: Install Tesseract OCR

we can install Tesseract OCR with the following command:

sudo apt install tesseract-ocr

now we have to install additional languages (in this example English, German and French):

sudo apt install tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra

if you want install all languages, the following command is with you:

sudo apt install tesseract-ocr-all

 

Step 3: Configure Nextcloud

Under „Administration -> Full text search“ we adjust the following settings:

Step 4: Create index

to create the index we have to run the following command in our nextcloud-directory:

sudo -u www-data php ./occ fulltextsearch:index

the first run will take some time, but if finished you can fulltextsearching…

That’s it, have fun and share!

8 Gedanken zu „How to install Fulltextsearch in Nextcloud with Elasticsearch and Tesseract OCR

  1. Falls der Dienst nicht startet, kann es sein, dass man zuwenig RAM hat, dann muss man in der /etc/elasticsearch/jvm.options zb auf 512m runtergehen.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht.