Cleaning scanned documents from debris, eliminating skew and line distortion. How to edit a scanned document? Can the scanned text be converted for corrections

After scanning a document, you can open the document in Word to edit it. The method depends on which version of Office is installed on your computer.

Scan a document as a PDF file and edit it in Word

Advice: The conversion is best for documents that are mostly text.

Scan the document according to the instructions for the scanner and save it to your computer as a PDF file.

In Word, open the menu file > Open.

Browse to the folder of the PDF file on your computer and click the button Open.

A message will appear stating that Word is converting the PDF file into an editable Word document. Click the button OK.

There may not be a complete page-by-page match between the converted document and the original. For example, line breaks and page breaks may be in other places. For more information, see Opening PDFs in Word.

additional information

The "from scanner or camera" option for scanning documents and images is not available in Word 2010. Instead, you can scan your document using a scanner and save the file to your computer.

Microsoft Office Document Imaging has been removed from Office 2010, but you can install it on your computer using one of the options described in Install the MODI application for use with Microsoft Office 2010 .

Before proceeding

Open Microsoft Office Document Imaging by searching for in the Windows Start Menu.

On the menu File select a team Open.

Find the scanned document and click the button Open.

After starting Microsoft Office Document Imaging, press CTRL+A to select the entire document, and then press CTRL+C.

Launch Microsoft Word.

On the tab File press the button Create.

Double click an element new document.

Press CTRL+V to paste the contents of the scanned document into a new file.

The "from scanner or camera" option for scanning documents and images is not available in Microsoft Office Word 2007. Instead, you can scan a document using a scanner and save the file to your computer.

Step 1: Install Microsoft Office Document Imaging

Close all programs.

Advice: We recommend that you print this section before you exit all programs.

Open the Control Panel: Right-click the Windows Start button and select Control Panel or type in the Windows search box control panel element .

On the control panels click Programs, and then - Programs and Features.

Right-click the name of the installed version of Microsoft Office, or right-click Microsoft Office Word 2007(depending on whether Word is installed as part of Office or as a standalone program), and then click change.

Select Add or remove components, and then click the button Proceed.

In section Installation options click the plus sign (+) next to the component Office Tools.

Click the arrow next to the component Microsoft Office Document Imaging, select an option Run everything from my computer, and then click the button Proceed.

Step 2: Create a document that can be modified.

Scan the document following the instructions for the scanner.

Before proceeding Convert the file created by the scanner to TIFF format. You can convert the file with Paint or another program.

You now have a document that you can edit. Don't forget to save the new file so you don't lose your changes.

Is it possible to change the scanned text? Can I edit the scanned text so that I can use it for other purposes later? Yes, dear friends! Today it is not only possible, but quite easy to do.

If you have the need, desire, as well as some technical capabilities, it will be easy for you to:

scanning of handwritten text (for example, abstract),
scanning text from a photo or picture,
editing,
text recognition after scanning,
converting text in the form of a picture to plain text, in which you can change the scanned text (for example, in a pdf document) of a document, etc.

In general, today you can do the same with text in a picture as with regular text in a Word document. And doing this is vital and useful for those who constantly deal with numerous documentation and spend a lot of time - that is, for students as well. Let's figure out how it's done.

What is the difference between scanning and recognition?

As it turned out, scanning and text recognition are two different things. Scanning sheets of a document is its translation of text into electronic form. This is done through a scanner or using conventional photography on a smartphone or digital camera.

Recognition is the transformation of a scanned document (text) into an electronic form.

By the way! For our readers there is now a 10% discount on any kind of work

What do we need to scan and recognize text from a photo?

To scan and recognize text, we cannot do without some things:

Scanner. Actually, the role of a scanner can be performed not only by this type of equipment, but also by a camera (in a smartphone, for example). If you are using a scanner, make sure that your computer has the system drivers and programs necessary for its full operation. If you do not have a scanner, but you are going to buy one, pay attention to the processing speed per sheet. Some devices process a sheet in 10 seconds, others will need 30 or more. And if you have to work with bulk materials of 300-400 sheets, then this factor matters.
Text recognition programs or online services. We have already written an article on services that help recognize text after scanning a document through a scanner. But now we would like to recommend ABBYY FineReader to you. Despite the fact that it is paid, its functionality is truly impressive. And if you work with huge volumes of documents, it will become your indispensable assistant. However, there is also its free counterpart Cunei Form, which does an excellent job of scanning and OCR online. True, its functionality is very limited compared to the previous counterpart.
Documents to Scan. Students often have to deal with scanning a document in the form of magazines, articles, books, abstracts, printouts, from where you often need to copy the text later. And just like that, in the form of advice - before you start scanning, try to search for these documents on the network. If you have already used these materials before you, there is a huge chance that a kind person has already done all the work for you. Atk, all you have to do is copy the text of the finished scanned document and edit the text after scanning.

Text scanning options

So, we bought the scanner, prepared the documents, installed the programs. What's next? Next, we will need to make the necessary settings, which also sometimes help to make the task much easier, for example, to recognize the scanned text in a certain format, edit the text after scanning in a certain mode, and so on.

In general, the quality and speed of your work will depend on the settings. So, let's figure it out together.

DPI quality

This is the image resolution that will be important when editing text in the scanned document. Set the quality in the settings to at least 300 DPI, and if possible, then more. The higher this value, the clearer the image will be after scanning.

And the processing speed will depend on the clarity. That is, correct or change the scanned text, the text of the scanned sheet will be faster, and the program will make fewer errors (yes, programs also make mistakes, but first things first).

Chroma

Thanks to this parameter, you can influence the speed of text scanning. As a rule, scanners have 3 modes: black and white (suitable for sheets with plain printed text), gray (suitable for working with documents with tables and simple pictures), color (for magazines, books and other documents where color matters). ). The smaller the color, the faster the processing speed of the document.

A photo

As we have said, for scanning, you can use not only a scanner, but also photographing. But be careful here - any blurring, fuzziness and other image distortions can affect the further recognition and editing of text in the scanned document.

Recognition

So, we scanned and received the pages in electronic form. Then we open the program for recognition (for example, FineReader) and start recognizing the text. Some programs (including ours) do this process with errors. Then the area with the error will need to be selected manually.

Work with text

You can select text in the Text area. Any tables and images can be deleted. But to work with unusual and rare symbols, you will have to work with pens. Here's what it looks like in the program:

Images

This area in the program is used to work with images and with those areas of text that are difficult to recognize.

tables

The table selection button helps you work with tables. However, this feature is not well developed. Sometimes it's easier to use the Image editor to work with tables. This will save a lot of time and nerves, and then you can finalize everything in a regular Word.

Extra elements

If there are elements on the page that you absolutely do not need or are useless, select an unnecessary area and delete it with an eraser. It is enough to switch to edit mode and work. Moreover, the more unnecessary elements you remove, the faster the process of text recognition will occur.

Error checking and saving work results

As we have already said, errors can occur when you use low-quality, blurry, fuzzy images or documents with rare characters. Therefore, always check the document after the recognition process.

Found? Great - just enter the desired character. By the way, the program has a check mode that will help you quickly and without your participation check the document for program errors. And immediately after the end of the check, you can directly import the document from the program (save it in a format) into Word or any other program.

copy type

When you save a document (in edit mode), you will be prompted to save it in three types of copies. Exact copy is a complete copy of the scanned document with all the formatting done. If you later plan to edit the text after scanning in Word, then it is best to choose this option.

Editable copy helps to save already edited text. Good for when you have a lot of post-editing to do. Plain Text - Ideal for those who want to end up with plain text without all the other page elements.

That's all. It is difficult, long and tedious, but it is much faster to scan and recognize text (even handwritten) with a program than to rewrite 100,500 documents manually. Well, if you don’t have time to do this either, contact the student service for help. Here you will quickly, cheaply and efficiently do everything you need.

Sometimes there is no time to create a new document and you need to urgently:

edit the scanned drawing or diagram, make additions, comments to the document;
insert fields to fill in the scanned picture of the document form;
just get a clean document without blots and extra dirty background.

To do this, the scanned document must first be converted to black and white, if necessary, eliminate the warp, and clean it of "garbage".
I foresee the question - why not turn on the black and white mode when scanning? It is possible, but the quality of the resulting image in this case will be an order of magnitude lower than in the considered example.

There are specialized programs for this purpose, such as Spotlight Pro, but they are difficult to use and take a considerable amount of time to master.

I would like to offer a simpler but more efficient way of processing scanned documents using ABBYY FineReader 9.0 OCR.

You can scan a document directly from the program interface or insert an already scanned picture for processing.

For clarity and complication of the task, we will take an already scanned spread of the book, with pages skewed and yellowed from time to time. Using the ABBYY FineReader 9.0 program, we will convert the drawing to black and white, correct the skew and clean it from debris.

We launch the ABBYY FineReader program and in the menu Service, choose a command Options.
In the window that opens, in the tab Scan/Open, mark the item Do not process received images, since we will not recognize text - we only need an image. Choosing options Image processing:
Now let's clear the image of debris - small dots. To do this, simply press the button 1-3 times clear image, while monitoring the garbage removal process. the first press removes smaller dots, and subsequent presses remove larger dots.
The section of the drawing before the debris removal process began.

Section of the drawing after a single click on the button clear image

Section of the drawing after the second button press clear image
Now, it remains to clear the image of large "blobs" and dark areas. You can do this with the tool Eraser .
The principle of operation of this tool differs from the work of similar tools of other common graphic editors, and of course for the better. In this case, there is no need to fidget with the leaf icon over the image, periodically selecting the cancel command after accidentally erased useful information. You can erase a section of an image in ABBYY FineReader by selecting this section.
Holding down the left mouse button, select an area of the image of any size, and, making sure that the selected area includes only elements intended for deletion, release the button. The designated area has been cleared.
We examine the resulting cleaned drawing, and on one of the pages we find a small nuance. Lines of text are slightly distorted. But it turns out that this scanning defect can be easily eliminated. We press the button Fix line corruption and the defect has been fixed.
Image section before line distortion correction

Image section after line distortion correction
That's all, the image turned out clean, without distortions and distortion of lines.

It can be printed without wasting extra paint on blots and dirt, sent by e-mail to a friend, without remorse for a low-quality scan, and also saved for future use in any of the supported formats.

To save an image to the menu File, choose a command Save image as.... Choose any of the supported formats:
Bitmap, black and white (*.bmp; *.dib; *.rle)
Bitmap, gray (*.bmp; *.dib; *.rle)
Bitmap, color (*.bmp; *.dib; *.rle)
DCX B&W (*.dcx)
DCX Gray (*.dcx)
DCX color (*.dcx)
JBIG2 (*.jb2; *.jbig2)
JPEG 2000 Gray (*.jp2; *.j2k)
JPEG 2000 color (*.jp2; *.j2k)
JPEG gray (*.jpg; *.jpeg)
JPEG, color (*.jpg; *.jpeg)
PCX Black & White (*.pcx)
PCX gray (*.pcx)
PCX color (*.pcx)
PNG, black and white (*.png)
PNG, gray (*.png)
PNG, color (*.png)
TIFF, black and white, uncompressed (*.tif; *.tiff)
TIFF, black and white, packbits (*.tif; *.tiff)
TIFF, black and white, compression: ZIP (*.tif; *.tiff)
TIFF, black and white, compression: LZW (*.tif; *.tiff)
TIFF, B&W, Group4 (*.tif; *.tiff)
TIFF, gray, uncompressed (*.tif; *.tiff)
TIFF, Grey, Packbits (*.tif; *.tiff)
TIFF Gray Compression: JPEG (*.tif; *.tiff)
TIFF, grey, compression: ZIP (*.tif; *.tiff)
TIFF, grey, compression: LZW (*.tif; *.tiff)
TIFF, color, uncompressed (*.tif; *.tiff)
TIFF, color, Packbits (*.tif; *.tiff)
TIFF, color, compression: JPEG (*.tif; *.tiff)
TIFF, color, compression: ZIP (*.tif; *.tiff)
TIFF, color, compression: LZW (*.tif; *.tiff)
PDF (*.pdf)

Cleaned page scans from "garbage" and with corrected line distortion.

I would like to note that many are currently converting their documents (drawings, diagrams, books ...) into electronic form. With a large amount of work, it is more convenient to use a camera for these purposes. With some models of scanners and cameras that support the function of recapturing documents, the ABBYY FineReader program is included in the kit. When choosing a tool for converting documents into electronic form, you should take this into account, since FineReader, taking into account its main purpose - optical text recognition, is no less useful program for those who work with documents than a text editor.

A scanner is a device that recognizes objects, images or documents and writes their visual image into a graphic file that can be edited in various ways. What is the purpose of this operation? How to edit a scanned document?

The word "editing" in this case means:

Editing as Image Modification

As we noted above, the scanner, processing a document or other object, subsequently creates a static image based on its image in the form of a separate graphic file - for example, in Jpeg format. The most common editing needs are:

surface adjustment (resizing, reflection, rotation by a given number of degrees, color balance adjustment);
editing image elements (changing their appearance, deleting, adding new ones).

Surface correction of the image received from the scanner can be carried out using the most accessible types of software that are installed in Windows by default. What is the easiest program to edit a scanned document? It will probably be Paint. Important file editing options are located in the program menu, as well as on the toolbar of its interface.

Starting Paint is very easy: you should click (in Windows up to version 7 inclusive) "Start", then - "All Programs" - "Accessories" - Paint. Then, using the interface of this software, open the desired file and make the necessary adjustments to it.

A more complex procedure - editing image elements - can involve the widest range of possible operations: from applying small retouching or letters to an image to merging it with another graphic file in the form of a collage. Depending on the complexity of the corresponding procedure, it will be necessary to use one or another type of software.

If the operations with the image are simple (for example, the case is limited to applying letters to it), then you can use the same Paint. In the toolbar of this program, which is located in its interface on the left, you need to select "Text". With it, printed letters are applied to the image.

Editing texts and other formatting objects in an image

How do I edit scanned documents with these programs? These solutions work like this: they process the image, recognize the text and other formatting objects present on it, and then put them into a separate file, which can, in turn, be opened using text editors - Word, OpenOffice and their analogues - and freely edit.

Subsequently, you can place the modified text (tables, lists) on the same scanned image from which it was recognized in its original form. In order to carry out this procedure, it is necessary to open the corresponding graphic file in an editing program - for example, Paint, in one window, in another - the recognized and edited text (tables, lists). Having made the second window active, you need to take a screenshot of the text (a snapshot of the current image on the monitor screen) using the Print Screen Sysrq key, then paste it into Paint (using the Ctrl and V combination), and then place it on the scanned image as required .

A similar need may arise, for example, for a magazine cover designer who needs to edit the text placed on it, and if for some reason he does not have the source file. He can recognize the necessary paragraphs from the paper page of the publication, make corrections to them, and then re-place them, already in a modified form, on the scanned image of the page.