Vivaldi document database diagnostics

Vivaldi document database diagnostics

March 26, 2019

To restrict access to documents, the Vivaldi server displays the pages for which viewing permission has been granted as graphic images, and not as PDF documents. Converting the pages of a document into a graphic image is called rasterisation. This requires CPU time and RAM, and puts a significant load on the server.

With high numbers of documents in the database, frequent updates and the addition of new copies, as well as a large number of simultaneous readers, the performance of the server software is reduced. The load on the equipment depends on the quality and validity of the files located in the storage. When accessing documents scanned with excessive resolution and/or saved without including additional optimisation (image compression), the server experiences excessive load on the processor and RAM. If a lot of users are working simultaneously, the server response time increases, which is an inconvenience to users. To monitor the state of the database and detect problem files in good time, a special software solution in the form of a Windows service was designed and developed.

When the administrator first launches the service, it goes through all of the documents stored in the database and measures the rasterisation time for each page, saving the resulting data. If the page rasterisation time exceeds a predefined threshold, the document is recognised as problematic, and a corresponding entry is registered in the local database.

When the service is restarted it goes through the entire database again, looking for changes that have occurred since the service was last launched. Documents that have been analysed previously are excluded from reprocessing if the last document modification date is earlier than the previous data collection date.

When the service is active, it constantly monitors the database for any changes, checks if new documents have been added and processes any modified or added items.

The service saves statistical data:

  • The overall number of documents in the database;
  • The overall number of problem documents;
  • The number of documents processed and problem documents found since the last report was sent;
  • A list of all problem documents in the database with the overall number and individual entries of problem pages.

The statistics are sent according to a schedule which can be modified in the settings.

The service was designed and developed to be able to work with a large amount of data. To increase the document processing speed the service was created as a multi-threaded application which is able to monitor the maximum load exerted on the server.

Using a standard programming language features to measure the exact page rasterisation time turned out to be insufficient, but a solution was found: using WinAPI system functions to measure code execution times.

Correct operation and fault tolerance was tested on a database containing 600,000 documents. Single page rasterisation times range from several dozens of milliseconds to several dozens of seconds.