• Bokmål
  • English

Sitemap

Removal of clutter from historical scanned documents

Removal of clutter from historical scanned documents

An innovation project led by Lumex A/S and partly financed by the Norwegian Research Council, aims to improve optical character recognition in noisy historical documents. The goal is to develop solutions that will be able to detect, locate and characterize clutter and then apply adapted OCR to regions containing clutter. The clutter can be related to tears, cracks and aging of the paper of the documents, or stamps and annotations that have been deliberately introduced. Ink smears and blobs from the printing process are also frequent.

NR is contributing to this project by developing novel methods that can help to remove various types of clutter in such images. The images below show results where clutter has been automatically located and marked.

   

Detected clutter marked in red.

Department

Partners

  • Lumex A/S
  • The National Archives of Norway
Postal address:
Norsk Regnesentral/
Norwegian Computing Center
P.O. Box 114 Blindern
NO-0314 Oslo
Norway
Visit address:
Norsk Regnesentral
Gaustadalleen 23a
Kristen Nygaards hus
NO-0373 Oslo.
Phone:
(+47) 22 85 25 00
Address How to get to NR
Social media Share on social media
Privacy policy Privacy policy
Postal address: Norsk Regnesentral/Norwegian Computing Center, P.O. Box 114 Blindern, NO-0314 Oslo, Norway
Visit address: Norsk Regnesentral, Gaustadalleen 23a, Kristen Nygaards hus, NO-0373 Oslo.
Phone: (+47) 22 85 25 00
AddressHow to get to NR