From autonomous archive
(Redirected from AutoArchiver)
Jump to navigation Jump to search


This page contains the code for the Auto Archiver (at its most stable and up to date). Below is the main script that creates the interface for the user on the Archive Machine. The other scripts that are being called are available as a download from this page also. Please make sure that you test the software before updating the Archive Machine and before uploading it on this page.


Note: Most of these scripts will only work on machines that can run Linux. Scanning is reliant on the scanimage which is not available on OS X or Windows.

You will need to download the scripts from the bottom of this page and extract it into any working folder you like. Before you can use these tools you will need to follow the following instructions to install and configure all the dependencies. Initially you will need to install pip which is a python package manager.

All installation instructions included are for the Linux distribution Debian.


All the scripts in the scripts folder can be run directly from the command line. They function as they do when called in the script, but allow for individual tasks of connecting to the archive, scanning, translating, and uploading. All scripts come with their own help file so that you may understand how to interact with them. Simply type the following :

<syntaxhighlight lang="bash"> python -h </syntaxhighlight>


This is necessary for the installation of some Python libraries that these scripts are dependent upon. To install this you will need to run the following command :

sudo apt install python-pip

After you have successfully installed pip you may go ahead and install any python libraries by typing :

pip install [package name]

MediaWiki Client[edit]

MediaWiki client allows for connection to the archive. Without this installed, none of your archival work will upload to the server. To install this library, it is as simple as following the steps provided.


To install MediaWiki client you will need to run the following command :

pip install mwclient

Once this is installed you will still need to modify the configuration file found in the AutoArchvier's location on your computer. To do this navigate to the folder that contains the AutoArchiver's application. Then rename the config.sample.cfg file to config.cfg by doing the following :

cp config.sample.cfg config.cfg

You will then need to update the parameters within this file for your scripts to successfully connect to the archive. The specific details you will need can be found by asking an archiver, but below you can find what each of the parameters mean.

url: The Url to the wiki (example.local or etc.)
path: The Path to the wiki (if it is simply at the url then enter (without quotes) "/", or if it is enter "/cheese")
login: The AutoArchiver Bot user (default is AutoArchiver)
pass: The AutoArchiver Bot password (talk to someone in the know)

url : This needs to be the valid url of the Archive. If it being hosted locally, you will need to provide something like localhost and a port number if necessary. path""" : If you archive is being hosted within another domain you will need to specify that path. For example, if the URL is this field will be /archive. If it is simply at the base URL, then it will be /. login : This is the username of the AutoArhive bot. This will probably be AutoArchiver but can also be any other valid user account. pass : This is the password for the account name provided above.

Once all this has been setup correctly, the AutoArchiver and Archive to Print scripts will work without issues.


This needs to be installed directly into your machine to be accessible from the command line. To do this you will need to follow specific instructions pertaining to your computer's distribution.


Installation on Debian is a simple process as it, and its respective language packs are available from the standard software repositories provided by Debian. To install Tesseract simply type :

sudo apt install tesseract-ocr

You will then need to install the languages you wish to convert your files to and from. To get a list of available packages you can type the folliwing :

sudo apt search tesseract-ocr

The language packs are named in the format tesseract-ocr-language code. When you have found the languages you wish to install, type the following into your terminal (replacing language code with the appropriate code) :

sudo apt install tesseract-ocr-langauage code


Once your installation has completed without errors you are both able to use the as part of the AutoArchiver tools, but also Tesseract's OCR abilities straight from the command line. For example :


If you want to see a list of available languages that Tesseract can detect type the following :

tesseract --list-langs


If you plan on scanning, you will need to plugin a scanner of your choosing. To check that it will be detectable by the AutoArchiver tools, run the following command in your terminal.

<syntaxhighlight lang="bash"> scanimage -L </syntaxhighlight>

This will list all detected scanners. If none appear in the list, and your scanner is plugged in and turned on, then sadly it may not be compatible with this command line tool.


At this time there is a limitation to the translation tool. It is built upon a translation library for Python that freely accesses the Google Translate API. This API is no longer a free service, and as such this tool will stop working after a few uses based on your IP address.

Further work is required to change this system. At this time, the only seemingly viable option is using the Microsoft Bing translation engine which is still a free service.


Connection to the Archive is done via the Media Wiki client library. This is a non-standard library for Python that will need to be installed on your system. You will need to have pip package manager installed to do this. Once pip is on your system install the Media Wiki client library for Python by typing the following into your terminal :

<syntaxhighlight lang="bash"> pip install mwclient </syntaxhighlight>

So that uploading functionality works, you will need to update the config.sample.conf file with appropriate details. Instructions are entered into the document itself.

Once you have filled in the appropriate details, you may test the connection by running the script from the command line. For example if you are in the folder where you extracted the AutoArchiver the run :

<syntaxhighlight lang="bash"> python scripts/ </syntaxhighlight>

This script will inform you if you have successfully connected to the Archive machine.


Debug data: