Today I want to show you how to install Fulltextsearch with OCR in Nextcloud 13 or 14 with Elasticsearch and Tesseract OCR. This guide also works with Plesk Onyx.
Prerequisites:
– an Ubuntu VPS with running Nextcloud-Instance
– root-access
the following Apps must be installed and activated in Nextcloud:
– Full text search – Bookmarks (BETA)
– Full text search – Elasticsearch Platform (BETA)
– Full text search – Files – Tesseract OCR (BETA)
– Full text search – Files (BETA)
– Full text search (BETA)
Step 1: Install Elasticsearch
We have to install Java:
sudo apt-get install openjdk-8-jre
and we add the elasticsearch repository:
sudo apt install apt-transport-https wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list sudo apt update
now we can install elasticsearch:
sudo apt install elasticsearch
for security reasons, we bind elasticsearch to 127.0.0.1;
sudo nano /etc/elasticsearch/elasticsearch.yml
add or adjust the following line:
network.host: 127.0.0.1
and enable the service:
sudo systemctl daemon-reload && sudo systemctl enable elasticsearch && sudo systemctl start elasticsearch
check if elasticsearch is running:
curl -XGET '127.0.0.1:9200/?pretty'
and install the „ingest-attachment“-Plugin (required for PDF, PPT, XLS etc.)
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment
now restart elasticsearch and we are ready with this step:
sudo systemctl restart elasticsearch
Step 2: Install Tesseract OCR
we can install Tesseract OCR with the following command:
sudo apt install tesseract-ocr
now we have to install additional languages (in this example English, German and French):
sudo apt install tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra
if you want install all languages, the following command is with you:
sudo apt install tesseract-ocr-all
Step 3: Configure Nextcloud
Under „Administration -> Full text search“ we adjust the following settings:
Step 4: Create index
to create the index we have to run the following command in our nextcloud-directory:
sudo -u www-data php ./occ fulltextsearch:index
the first run will take some time, but if finished you can fulltextsearching…
That’s it, have fun and share!
10 Gedanken zu „How to install Fulltextsearch in Nextcloud with Elasticsearch and Tesseract OCR“
I installed everything as described and it worked perfectly with all local files.
But the search doesn’t show any external files from a cifs share.
When I do the indexing at the command line I see that elasticsearch is indexing my external cifs files but nothing shows up when I make a search in nextcloud.
Does anyone know what the problem is ? I can’t seem to find a log file showing the eventual search errors.
Thanks
please look here: https://github.com/nextcloud/fulltextsearch/issues/301
Falls der Dienst nicht startet, kann es sein, dass man zuwenig RAM hat, dann muss man in der /etc/elasticsearch/jvm.options zb auf 512m runtergehen.
Danke für den Hinweis
If a German blog is ok – here is what I did to run nextcloud in docker: https://nerdblog.steinkopf.net/2018/03/nextcloud-mit-docker/ (Feedback welcome)
P.S. Info about elasticsearch is following in the blog (sorry I forgot that it is not yet online).
Now it is online: https://nerdblog.steinkopf.net/2018/07/nextcloud-volltext-index-mit-docker-und-elasticsearch/ (Tesseract follows).
Hi,
Is there a way to package all this with docker containers ?
Hi,
I‘m working on this
Regards