Pattern recognition systems in world information resources. Review of existing methods of pattern recognition. Examples of pattern recognition problems

Chapter 3: Pattern Recognition (Identification) Systems

  • The concept of an image. The problem of learning pattern recognition. Geometric and structural approaches. compactness hypothesis. Education and self-study. adaptation and learning.
  • Methods for learning pattern recognition - perceptrons, neural networks, method of potential functions, method of group accounting of arguments, method of limiting simplifications, teams of decision rules.
  • Methods and algorithms for analyzing the structure of multidimensional data - cluster analysis, hierarchical grouping.

The concept of an image

Image, class - a classification grouping in the classification system that unites (singles out) a certain group of objects according to some attribute.

The figurative perception of the world is one of the mysterious properties of the living brain, which makes it possible to understand the endless stream of perceived information and maintain orientation in the ocean of disparate data about the outside world. Perceiving the external world, we always classify the perceived sensations, that is, we divide them into groups of similar but not identical phenomena. For example, despite the significant difference, one group includes all the letters A, written in different handwriting, or all sounds corresponding to the same note, taken in any octave and on any instrument, and the operator controlling the technical object, for a whole set of states object reacts with the same reaction. Characteristically, in order to form the concept of a group of perceptions of a certain class, it is enough to get acquainted with a small number of its representatives. A child can be shown just once a letter so that he can find this letter in a text written in various fonts, or recognize it, even if it is written in a deliberately distorted form. This property of the brain allows us to formulate such a concept as an image.

Images have a characteristic property, which manifests itself in the fact that acquaintance with a finite number of phenomena from the same set makes it possible to recognize an arbitrarily large number of its representatives. Examples of images can be: river, sea, liquid, Tchaikovsky's music, Mayakovsky's poems, etc. A certain set of states of the control object can also be considered as an image, and this whole set of states is characterized by the fact that in order to achieve a given goal, the same impact on an object. Images have characteristic objective properties in the sense that different people who learn from different observational material, for the most part, classify the same objects in the same way and independently of each other. It is this objectivity of images that allows people all over the world to understand each other.

The ability to perceive the external world in the form of images allows one to recognize with a certain certainty an infinite number of objects based on acquaintance with a finite number of them, and the objective nature of the main property of images allows one to model the process of their recognition. Being a reflection of objective reality, the concept of an image is as objective as reality itself, and therefore this concept can in itself be the object of a special study.

In the literature devoted to the problem of training pattern recognition (ORO), the concept of a class is often introduced instead of the concept of an image.

The Problem of Learning Pattern Recognition (ORO)

One of the most interesting properties of the human brain is the ability to respond to an infinite number of environmental conditions with a finite number of reactions. Perhaps it was this property that allowed a person to achieve the highest form of existence of living matter, expressed in the ability to think, i.e., actively reflect the objective world in the form of images, concepts, judgments, etc. Therefore, the problem of ORO arose when studying the physiological properties of the brain .

Consider an example of tasks from the ODP area.


Rice. one

Here are 12 tasks in which it is necessary to select features that can be used to distinguish the left triad of pictures from the right one. The solution of these problems requires the modeling of logical thinking in full.

In general, the pattern recognition problem consists of two parts: learning and recognition. Education is carried out by showing individual objects with an indication of their belonging to one or another image. As a result of training, the recognition system must acquire the ability to respond with the same reactions to all objects of the same image and different reactions to all objects of different images. It is very important that the learning process should end only by displaying a finite number of objects without any other prompts. As learning objects, there can be either pictures or other visual images (letters), or various phenomena of the external world, for example, sounds, the state of the body during a medical diagnosis, the state of a technical object in control systems, etc. It is important that only the objects and their belonging to the image. Training is followed by the process of recognition of new objects, which characterizes the actions of an already trained system. The automation of these procedures is the problem of training in pattern recognition. In the case when a person himself guesses or invents, and then imposes a classification rule on the machine, the recognition problem is partially solved, since the main and main part of the problem (training) is taken over by the person.

The problem of training in pattern recognition is interesting both from an applied and from a fundamental point of view. From an applied point of view, the solution of this problem is important, first of all, because it opens up the possibility of automating many processes that until now have been associated only with the activity of a living brain. The fundamental significance of the problem is closely related to the question that increasingly arises in connection with the development of ideas in cybernetics: what can and what fundamentally cannot a machine do? To what extent can the capabilities of a machine be brought closer to those of a living brain? In particular, can a machine develop the ability to take over from a person the ability to perform certain actions depending on situations that arise in the environment? So far, it has only become clear that if a person can first realize his ability himself, and then describe it, i.e., indicate why he performs actions in response to each state of the external environment or how (by what rule) he combines individual objects into images, then such a skill can be transferred to a machine without fundamental difficulties. If a person has a skill, but cannot explain it, then there is only one way to transfer skill to a machine - learning by examples.

The range of tasks that can be solved with the help of recognition systems is extremely wide. This includes not only the tasks of recognizing visual and auditory images, but also the tasks of recognizing complex processes and phenomena that arise, for example, when choosing appropriate actions by the head of an enterprise or choosing the optimal management of technological, economic, transport or military operations. In each of these tasks, some phenomena, processes, states of the external world are analyzed, hereinafter referred to as objects of observation. Before starting the analysis of any object, it is necessary to obtain certain, ordered information about it in some way. Such information is a characteristic of objects, their display on the set of perceiving organs of the recognition system.

But each object of observation can act differently, depending on the conditions of perception. For example, any letter, even written in the same way, can, in principle, be displaced in any way relative to the perceiving organs. In addition, objects of the same image can be quite different from each other and, of course, affect the perceiving organs in different ways.

Each mapping of any object to the perceiving organs of the recognizing system, regardless of its position relative to these organs, is usually called an image of the object, and sets of such images, united by some common properties, are images.

When solving control problems by image recognition methods, the term "state" is used instead of the term "image". A state is a certain form of displaying the measured current (or instantaneous) characteristics of the observed object. The set of states determines the situation. The concept of "situation" is analogous to the concept of "image". But this analogy is not complete, since not every image can be called a situation, although every situation can be called an image.

A situation is usually called a certain set of states of a complex object, each of which is characterized by the same or similar characteristics of the object. For example, if a certain control object is considered as an object of observation, then the situation combines such states of this object in which the same control actions should be applied. If the object of observation is a military game, then the situation combines all the states of the game that require, for example, a powerful tank attack with air support.

The choice of the initial description of objects is one of the central tasks of the ODP problem. With a successful choice of the initial description (feature space), the recognition task may turn out to be trivial, and, conversely, an unsuccessfully chosen initial description may lead either to very difficult further processing of information, or to the absence of a solution at all. For example, if the problem of recognizing objects that differ in color is being solved, and signals received from weight sensors are chosen as the initial description, then the recognition problem cannot be solved in principle.

Geometric and structural approaches.

Every time we are faced with unfamiliar problems, there is a natural desire to present them in the form of some easily understood model that would allow us to comprehend the problem in terms that are easily reproduced by our imagination. And since we exist in space and time, the most understandable for us is the spatio-temporal interpretation of tasks.

Any image that arises as a result of observing an object in the process of learning or exam can be represented as a vector, and hence as a point in some feature space. If it is argued that when displaying images it is possible to unambiguously attribute them to one of two (or several) images, then it is thereby asserted that in some space there are two (or several) regions that do not have common points, and that the images are points from these areas. Each such area can be assigned a name, i.e., give a name corresponding to the image.

Let us now interpret the process of learning pattern recognition in terms of a geometric picture, restricting ourselves for now to the case of recognizing only two patterns. It is assumed in advance only that it is required to separate two regions in some space and that only points from these regions are displayed. These areas themselves are not predetermined, i.e., there is no information about the location of their boundaries or rules for determining whether a point belongs to a particular area.

In the course of training, points randomly selected from these areas are presented, and information is reported about which area the presented points belong to. No additional information about these areas, i.e. about the location of their boundaries, is given during training. The goal of learning is either to build a surface that would separate not only the points shown in the learning process, but also all other points belonging to these areas, or to build surfaces that bound these areas so that each of them contains only points of the same image. In other words, the goal of learning is to construct such functions from image vectors that would be, for example, positive at all points of one image and negative at all points of another image. Due to the fact that the regions do not have common points, there is always a whole set of such separating functions, and as a result of learning, one of them must be built.

If the presented images belong not to two, but to a larger number of images, then the task is to build, according to the points shown during training, a surface that separates all areas corresponding to these images from each other. This problem can be solved, for example, by constructing a function that takes the same value over the points of each of the regions, and the value of this function over points from different regions should be different.



Rice. 2 - Two images.

At first glance, it seems that knowing just a certain number of points from the area is not enough to separate the entire area. Indeed, one can specify an innumerable number of different regions that contain these points, and no matter how the surface that selects the region is built from them, it is always possible to specify another region that intersects the surface and at the same time contains the points shown. However, it is known that the problem of approximating a function from information about it in a limited set of points, which is much narrower than the entire set on which the function is given, is a common mathematical problem of approximating functions. Of course, the solution of such problems requires the introduction of certain restrictions on the class of functions under consideration, and the choice of these restrictions depends on the nature of the information that the teacher can add in the learning process. One such hint is the conjecture about the compactness of images. It is intuitively clear that the approximation of the separating function will be an easier task, the more compact and the more spaced out the regions to be separated. So, for example, in the case shown in Fig. 2a, the separation is obviously simpler than in the case shown in Fig. 2b. Indeed, in the case shown in Fig. 2a, the regions can be separated by a plane, and even with large errors in the definition of the separating function, it will still continue to separate the regions. In the case in Fig. 2b, the separation is carried out by an intricate surface, and even slight deviations in its shape lead to separation errors. It was this intuitive notion of relatively easily separable regions that led to the compactness conjecture.

Along with the geometric interpretation of the problem of learning to recognize patterns, there is another approach, which is called structural or linguistic. Let us explain the linguistic approach using the example of visual image recognition. First, a set of initial concepts is distinguished - typical fragments found in images, and characteristics of the mutual arrangement of fragments - "left", "bottom", "inside", etc. These initial concepts form a dictionary that allows you to build various logical statements, sometimes called assumptions . The task is to select from a large number of statements that could be constructed using these concepts, the most significant for this particular case.

Further, looking at a finite and, if possible, a small number of objects from each image, it is necessary to construct a description of these images. The constructed descriptions must be so complete as to resolve the question of which image the given object belongs to. When implementing the linguistic approach, two problems arise: the problem of constructing an initial dictionary, i.e., a set of typical fragments, and the problem of constructing description rules from the elements of a given dictionary.

Within the framework of linguistic interpretation, an analogy is drawn between the structure of images and the syntax of a language. The desire for this analogy was caused by the possibility of using the apparatus of mathematical linguistics, i.e., the methods are syntactic in nature. The use of the apparatus of mathematical linguistics to describe the structure of images can be applied only after the segmentation of images into its component parts has been made, that is, words have been developed to describe typical fragments and methods for their search. After the preliminary work, which ensures the selection of words, linguistic tasks proper arise, consisting of tasks of automatic grammatical parsing of descriptions for image recognition. At the same time, an independent field of research appears, which requires not only knowledge of the basics of mathematical linguistics, but also the mastery of techniques that have been developed specifically for linguistic image processing.

Compactness hypothesis

If we assume that in the learning process, the feature space is formed based on the planned classification, then we can hope that the specification of the feature space itself sets a property, under the action of which the images in this space are easily separated. It is these hopes that, as work in the field of pattern recognition developed, stimulated the emergence of the compactness hypothesis, which states that compact sets in the feature space correspond to patterns. By a compact set, for the time being, we mean some "clumps" of points in the image space, assuming that there are rarefactions separating them between these clumps.

However, it was not always possible to confirm this hypothesis experimentally, but, most importantly, those tasks in which the compactness hypothesis performed well (Fig. 2a), without exception, all found a simple solution. And vice versa, those tasks for which the hypothesis was not confirmed (Fig. 2b) were either not solved at all, or were solved with great difficulty with the involvement of additional tricks. This fact made us at least doubt the validity of the compactness hypothesis, since a single example that denies it is enough to refute any hypothesis. At the same time, the fulfillment of the hypothesis wherever it was possible to solve the problem of training in pattern recognition well kept interest in this hypothesis. The compactness hypothesis itself has turned into a sign of the possibility of a satisfactory solution of recognition problems.

The formulation of the compactness hypothesis brings us close to the concept of an abstract image. If the coordinates of space are chosen randomly, then the images in it will be distributed randomly. They will be denser in some parts of the space than in others. Let's call some randomly chosen space an abstract image. In this abstract space, there will almost certainly be compact sets of points. Therefore, in accordance with the hypothesis of compactness, the set of objects that correspond to compact sets of points in an abstract space can be reasonably called abstract images of a given space.

Education and self-study. Adaptation and learning

All pictures shown in Fig. 1 characterize the learning task. In each of these problems, several examples (training sequence) of correctly solved problems are given. If it were possible to notice a certain universal property that does not depend either on the nature of the images or on their images, but determines only their ability to separability, then along with the usual task of learning to recognize, using information about the belonging of each object from the training sequence to one or another image one could pose a different classification problem - the so-called problem of learning without a teacher. A task of this kind at the descriptive level can be formulated as follows: objects are presented to the system simultaneously or sequentially without any indication of their belonging to images. The input device of the system maps a set of objects onto a set of images and, using some property of image separability embedded in it beforehand, makes an independent classification of these objects. After such a process of self-learning, the system should acquire the ability to recognize not only already familiar objects (objects from the training sequence), but also those that have not been presented before. The process of self-learning of a certain system is such a process, as a result of which this system, without the help of a teacher, acquires the ability to develop the same reactions to images of objects of the same image and different reactions to images of different images. The role of the teacher in this case consists only in prompting the system of some objective property that is the same for all images and determines the ability to divide a set of objects into images.

It turns out that such an objective property is the property of compactness of images. The mutual arrangement of points in the selected space already contains information about how the set of points should be divided. This information determines the property of pattern separability, which is sufficient for self-learning of the pattern recognition system.

Most of the well-known self-learning algorithms are capable of extracting only abstract images, i.e., compact sets in given spaces. The difference between them seems to lie in the formalization of the notion of compactness. However, this does not reduce, and sometimes even increases the value of self-learning algorithms, since often the images themselves are not predetermined by anyone, and the task is to determine which subsets of images in a given space are images. A good example of such a task setting is sociological research, when groups of people are distinguished by a set of questions. In this understanding of the problem, self-learning algorithms generate previously unknown information about the existence in a given space of images that no one had any idea about before.

In addition, the result of self-learning characterizes the suitability of the chosen space for a specific recognition learning task. If the abstract images identified in the process of self-learning coincide with the real ones, then the space is well chosen. The more abstract images differ from real ones, the more "inconvenient" the chosen space is for a specific task.

Learning is usually called the process of developing in some system a particular reaction to groups of external identical signals by repeatedly influencing the external correction system. Such external adjustment in training is usually called "encouragement" and "punishment". The mechanism for generating this adjustment almost completely determines the learning algorithm. Self-learning differs from learning in that here additional information about the correctness of the reaction to the system is not reported.

Adaptation is the process of changing the parameters and structure of the system, and possibly control actions based on current information in order to achieve a certain state of the system with initial uncertainty and changing operating conditions.

Learning is a process, as a result of which the system gradually acquires the ability to respond with the necessary reactions to certain sets of external influences, and adaptation is the adjustment of the parameters and structure of the system in order to achieve the required quality of control in conditions of continuous changes in external conditions.

And signs. Such tasks are solved quite often, for example, when crossing or driving a street at traffic lights. Recognizing the color of a lit traffic light and knowing the rules of the road allows you to make the right decision about whether or not to cross the street at the moment.

In the process of biological evolution, many animals solved problems with the help of visual and auditory apparatus. pattern recognition good enough. Creation of artificial systems pattern recognition remains a difficult theoretical and technical problem. The need for such recognition arises in a variety of areas - from military affairs and security systems to the digitization of all kinds of analog signals.

Traditionally, image recognition tasks are included in the scope of artificial intelligence tasks.

Directions in pattern recognition

There are two main directions:

  • The study of the recognition abilities possessed by living beings, their explanation and modeling;
  • Development of the theory and methods for constructing devices designed to solve individual problems in applied problems.

Formal statement of the problem

Pattern recognition is the assignment of initial data to a certain class by highlighting essential features that characterize these data from the total mass of non-essential data.

When setting recognition problems, they try to use the mathematical language, trying, unlike the theory of artificial neural networks, where the basis is to obtain a result by experiment, to replace the experiment with logical reasoning and mathematical proofs.

Most often, monochrome images are considered in pattern recognition problems, which makes it possible to consider an image as a function on a plane. If we consider a point set on a plane T, where the function x(x,y) expresses at each point of the image its characteristic - brightness, transparency, optical density, then such a function is a formal record of the image.

The set of all possible functions x(x,y) on surface T- there is a model of the set of all images X. Introducing the concept similarities between the images, you can set the task of recognition. The specific form of such a setting strongly depends on the subsequent stages in recognition in accordance with one or another approach.

Pattern recognition methods

For optical image recognition, you can apply the method of iterating over the type of an object at different angles, scales, offsets, etc. For letters, you need to iterate over the font, font properties, etc.

The second approach is to find the contour of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either a large number of examples of the recognition task (with correct answers), or a special neural network structure that takes into account the specifics of this task.

Perceptron as a method of pattern recognition

F. Rosenblatt, introducing the concept of a brain model, whose task is to show how psychological phenomena can arise in some physical system, the structure and functional properties of which are known - described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment, on the basis of which it is possible to obtain psychologically significant information about a certain system, boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention from the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to teach the system to carry out the required classification.

In a learning experiment, a perceptron is usually presented with a certain sequence of images, which includes representatives of each of the classes to be distinguished. According to some memory modification rule, the correct choice of reaction is reinforced. Then the control stimulus is presented to the perceptron and the probability of obtaining the correct response for stimuli of this class is determined. Depending on whether the selected control stimulus matches or does not match with one of the images that were used in the training sequence, different results are obtained:

  • 1. If the control stimulus does not coincide with any of the learning stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
  • 2. If the control stimulus excites a certain set of sensory elements that are completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization .

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough with one of the patterns about which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

  • Letter recognition.
  • Barcode recognition.
  • License plate recognition.
  • Face recognition.
  • Speech recognition.
  • Image recognition.
  • Recognition of local areas of the earth's crust in which mineral deposits are located.

Pattern recognition programs

see also

Notes

Links

  • Yuri Lifshits. Course "Modern Problems of Theoretical Informatics" - lectures on statistical methods of pattern recognition, face recognition, text classification
  • Journal of Pattern Recognition Research (Journal of Pattern Recognition Research)

Literature

  • David A. Forsyth, Jean Pons Computer vision. Modern Approach = Computer Vision: A Modern Approach. - M.: "Williams", 2004. - S. 928. - ISBN 0-13-085198-1
  • George Stockman, Linda Shapiro Computer vision = Computer Vision. - M.: Binom. Knowledge Laboratory, 2006. - S. 752. - ISBN 5947743841
  • A.L. Gorelik, V.A. Skripkin, Recognition methods, M .: Higher school, 1989.
  • Sh.-K. Cheng, Design principles of visual information systems, M.: Mir, 1994.

Wikimedia Foundation. 2010 .

In technology, a scientific and technical direction associated with the development of methods and the construction of systems (including on the basis of a computer) to establish the belonging of an object (subject, process, phenomenon, situation, signal) to one of the pre ... ... Big Encyclopedic Dictionary

One of the new regions cybernetics. The content of the theory of R. about. is the extrapolation of the properties of objects (images) belonging to several classes to objects that are close to them in some sense. Usually, when teaching an automaton R. about. there is ... ... Geological Encyclopedia

English recognition, image; German Gestalt alterkennung. A branch of mathematical cybernetics that develops principles and methods for classifying and identifying objects described by a finite set of features that characterize them. Antinazi. Encyclopedia ... ... Encyclopedia of Sociology

Pattern recognition- method of studying complex objects with the help of a computer; consists in the selection of features and the development of algorithms and programs that allow computers to automatically classify objects according to these features. For example, to determine which ... ... Economic and Mathematical Dictionary

- (technical), a scientific and technical direction associated with the development of methods and the construction of systems (including computer-based ones) to establish the belonging of an object (subject, process, phenomenon, situation, signal) to one of the pre ... ... encyclopedic Dictionary

PATTERN RECOGNITION- a section of mathematical cybernetics that develops methods for classifying, as well as identifying objects, phenomena, processes, signals, situations of all those objects that can be described by a finite set of certain features or properties, ... ... Russian sociological encyclopedia

pattern recognition- 160 pattern recognition: Identification of form representations and configurations using automatic means

The image is understood as a structured description of the object or phenomenon under study, represented by a feature vector, each element of which represents the numerical value of one of the features that characterize the corresponding object.

The general structure of the recognition system is as follows:

The meaning of the recognition problem is to establish whether the studied objects have a fixed finite set of features that allow them to be assigned to a certain class. Recognition tasks have the following characteristic features:

1. These are information tasks consisting of two stages:

a. Bringing the source data to a form convenient for recognition.

b. Recognition itself is an indication of the belonging of an object to a certain class.

2. In these problems, one can introduce the concept of analogy or similarity of objects and formulate the concept of proximity of objects as a basis for assigning objects to the same class or different classes.

3. In these tasks, it is possible to operate with a set of precedents - examples whose classification is known and which, in the form of formalized descriptions, can be presented to the recognition algorithm to adjust to the task in the learning process.

4. For these problems, it is difficult to build formal theories and apply classical mathematical methods: often the information for an accurate mathematical model or the gain from using the model and mathematical methods is incommensurable with the costs.

5. In these tasks, “bad information” is possible - information with gaps, heterogeneous, indirect, fuzzy, ambiguous, probabilistic.

It is advisable to distinguish the following types of recognition tasks:

1. The task of recognition, that is, the assignment of the presented object according to its description to one of the given classes (training with a teacher).

2. The task of automatic classification is the division of a set of objects (situations) according to their descriptions into a system of non-overlapping classes (taxonomy, cluster analysis, unsupervised learning).

3. The problem of choosing an informative set of features in recognition.

4. The problem of reducing the initial data to a form convenient for recognition.

5. Dynamic recognition and dynamic classification - tasks 1 and 2 for dynamic objects.

6. The task of forecasting - tasks 5, in which the solution must refer to some moment in the future.

The concept of an image.

An image, a class is a classification grouping in the system that unites (singles out) a certain group of objects according to some attribute. Images have a number of characteristic properties, which manifest themselves in the fact that acquaintance with a finite number of phenomena from the same set makes it possible to recognize an arbitrarily large number of its representatives.


As an image, one can also consider a certain set of states of the control object, and this whole set of states is characterized by the fact that the same impact on the object is required to achieve a given goal. Images have characteristic objective properties in the sense that different people who learn from different observational material, for the most part, classify the same objects in the same way and independently of each other.

In general, the problem of pattern recognition consists of two parts: training and recognition.

Education is carried out by showing individual objects with an indication of their belonging to one or another image. As a result of training, the recognition system must acquire the ability to respond with the same reactions to all objects of the same image and different reactions to all objects of different images.

It is very important that the learning process should end only by displaying a finite number of objects without any other prompts. The objects of learning can be either visual images, or various phenomena of the external world, and others.

Training is followed by the process of recognition of new objects, which characterizes the operation of an already trained system. The automation of these procedures is the problem of training in pattern recognition. In the case when a person himself solves or invents, and then imposes on the computer the rules of classification, the recognition problem is partially solved, since the main and main part of the problem (training) is taken over by the person.

The problem of training in pattern recognition is interesting both from an applied and from a fundamental point of view. From an applied point of view, the solution of this problem is important, first of all, because it opens up the possibility of automating many processes that until now have been associated only with the activity of a living brain. The fundamental significance of the problem is connected with the question of what a computer can and cannot do in principle.

When solving problems of managing methods of pattern recognition, the term "state" is used instead of the term "image". State - certain forms of displaying the measured current (instantaneous) characteristics of the observed object, the set of states determines the situation.

A situation is usually called a certain set of states of a complex object, each of which is characterized by the same or similar characteristics of the object. For example, if a certain control object is considered as an object of observation, then the situation combines such states of this object in which the same control actions should be applied. If the object of observation is a game, then the situation unites all states of the game.

The choice of the initial description of objects is one of the central tasks of the problem of learning pattern recognition. With a successful choice of the initial description (feature space), the recognition task may turn out to be trivial. Conversely, an unsuccessfully chosen initial description can lead either to a very difficult further processing of information, or to no solution at all.

Geometric and structural approaches.

Any image that arises as a result of observing an object in the process of learning or exam can be represented as a vector, and hence as a point in some feature space.

If it is argued that when displaying images it is possible to unambiguously attribute them to one of two (or several) images, then it is thereby asserted that in some space there are two or more regions that do not have common points, and that the image of a point is from these regions. Each point of such an area can be assigned a name, that is, give a name corresponding to the image.

Let us interpret the process of learning pattern recognition in terms of a geometric picture, limiting ourselves for now to the case of recognizing only two patterns. The only thing known in advance is that it is required to separate two regions in some space and that only points from these regions are shown. These areas themselves are not predetermined, that is, there is no information about the location of their boundaries or rules for determining whether a point belongs to a particular area.

In the course of training, points randomly selected from these areas are presented, and information is reported about which area the presented points belong to. No additional information about these areas, that is, the location of their boundaries during training, is reported.

The goal of learning is either to build a surface that would separate not only the points shown in the learning process, but also all other points belonging to these areas, or to build surfaces that bound these areas so that each of them contains only points of the same image. In other words, the goal of learning is to construct such functions from image vectors that would be, for example, positive at all points of one image and negative at all points of another image.

Due to the fact that the regions do not have common points, there is always a whole set of such separating functions, and as a result of learning, one of them must be built. If the presented images belong not to two, but to a larger number of images, then the task is to build, according to the points shown during training, a surface that separates all areas corresponding to these images from each other.

This problem can be solved, for example, by constructing a function that takes the same value over the points of each of the regions, and the value of this function over points from different regions should be different.

It may seem that knowing just a certain number of points from the area is not enough to separate the entire area. Indeed, one can specify an innumerable number of different regions that contain these points, and no matter how the surface that selects the region is built from them, it is always possible to specify another region that intersects the surface and at the same time contains the points shown.

However, it is known that the problem of approximating a function from information about it in a limited set of points is much narrower than the entire set on which the function is given, and is a common mathematical problem of approximating functions. Of course, the solution of such problems requires the introduction of certain restrictions on the class of functions under consideration, and the choice of these restrictions depends on the nature of the information that the teacher can add to the learning process.

One such hint is the conjecture about the compactness of images.

Along with the geometric interpretation of the problem of learning to recognize patterns, there is another approach, which is called structural or linguistic. Let's consider the linguistic approach on the example of visual image recognition.

First, a set of initial concepts is distinguished - typical fragments found in the image, and characteristics of the relative position of the fragments (left, bottom, inside, etc.). These initial concepts form a vocabulary that allows you to build various logical statements, sometimes called sentences.

The task is to select from a large number of statements that could be constructed using these concepts, the most significant for this particular case. Further, looking at a finite and, if possible, a small number of objects from each image, it is necessary to construct a description of these images.

The constructed descriptions must be so complete as to resolve the question of which image the given object belongs to. When implementing the linguistic approach, two tasks arise: the task of constructing an initial dictionary, that is, a set of typical fragments, and the task of constructing description rules from the elements of a given dictionary.

Within the framework of linguistic interpretation, an analogy is drawn between the structure of images and the syntax of a language. The desire for this analogy was caused by the possibility of using the apparatus of mathematical linguistics, that is, the methods are syntactic in nature. The use of the apparatus of mathematical linguistics to describe the structure of images can be applied only after the segmentation of images into component parts has been made, that is, words have been developed to describe typical fragments and methods for their search.

After the preliminary work, which ensures the selection of words, linguistic tasks proper arise, consisting of tasks of automatic grammatical parsing of descriptions for image recognition.

compactness hypothesis.

If we assume that in the learning process, the feature space is formed based on the planned classification, then we can hope that the specification of the feature space itself sets a property, under the action of which the images in this space are easily separated. It is these hopes that, as work in the field of pattern recognition developed, stimulated the emergence of the compactness hypothesis, which states that compact sets in the feature space correspond to patterns.

By a compact set we will understand certain clumps of points in the image space, assuming that there are rarefactions separating them between these clumps. However, this hypothesis has not always been confirmed experimentally. But those problems in which the compactness hypothesis was well fulfilled always found a simple solution, and vice versa, those problems for which the hypothesis was not confirmed were either not solved at all, or were solved with great difficulty and additional information.

The compactness hypothesis itself has turned into a sign of the possibility of satisfactorily solving recognition problems.

The formulation of the compactness hypothesis brings us close to the concept of an abstract image. If the coordinates of space are chosen randomly, then the images in it will be distributed randomly. They will be denser in some parts of the space than in others.

Let's call some randomly chosen space an abstract image. In this abstract space, there will almost certainly be compact sets of points. Therefore, in accordance with the compactness hypothesis, the set of objects to which compact sets of points correspond in an abstract space is usually called abstract images of a given space.

Training and self-training, adaptation and training.

If it were possible to notice a certain universal property that does not depend either on the nature of the images or on their images, but determines only the ability to separability, then along with the usual task of teaching recognition using information about the belonging of each object from the training sequence to one image or another, one can it would be better to pose a different classification problem - the so-called problem of learning without a teacher.

A task of this kind at the descriptive level can be formulated as follows: objects are presented to the system simultaneously or sequentially without any indication of their belonging to images. The input device of the system maps a set of objects onto a set of images and, using some property of image separability embedded in it beforehand, makes an independent classification of these objects.

After such a process of self-learning, the system should acquire the ability to recognize not only already familiar objects (objects from the training sequence), but also those that have not been presented before. The process of self-learning of a certain system is such a process, as a result of which this system, without the help of a teacher, acquires the ability to develop the same reactions to images of objects of the same image and different reactions to images of different images.

The role of the teacher in this case consists only in prompting the system of some objective property that is the same for all images and determines the ability to divide a set of objects into images.

It turns out that such an objective property is the property of compactness of images. The mutual arrangement of points in the selected space already contains information about how the set of points should be divided. This information determines the property of pattern separability, which is sufficient for self-learning of the pattern recognition system.

Most of the well-known self-learning algorithms are able to select only abstract images, that is, compact sets in given spaces. The difference between them lies in the formalization of the notion of compactness. However, this does not reduce, and sometimes even increases the value of self-learning algorithms, since often the images themselves are not predetermined by anyone, and the task is to determine which subsets of images in a given space are images.

An example of such a statement of the problem is sociological research, when groups of people are singled out according to a set of questions. In this understanding of the problem, self-learning algorithms generate previously unknown information about the existence in a given space of images that no one had any idea about before.

In addition, the result of self-learning characterizes the suitability of the chosen space for a specific recognition learning task. If the abstract images allocated in the space of self-learning coincide with the real ones, then the space has been chosen successfully. The more abstract images differ from real ones, the more inconvenient the chosen space for a specific task.

Learning is usually called the process of developing in some system a particular reaction to groups of external identical signals by repeatedly influencing the external correction system. The mechanism for generating this adjustment almost completely determines the learning algorithm.

Self-learning differs from learning in that here additional information about the correctness of the reaction to the system is not reported.

Adaptation is the process of changing the parameters and structure of the system, and possibly control actions, based on current information in order to achieve a certain state of the system with initial uncertainty and changing operating conditions.

Learning is a process, as a result of which the system gradually acquires the ability to respond with the necessary reactions to certain sets of external influences, and adaptation is the adjustment of the parameters and structure of the system in order to achieve the required quality of control in conditions of continuous changes in external conditions.


Speech recognition systems.

Speech acts as the main means of communication between people and therefore speech communication is considered one of the most important components of the artificial intelligence system. Speech recognition is the process of converting an acoustic signal generated at the output of a microphone or telephone into a sequence of words.

A more difficult task is the task of understanding speech, which is associated with the identification of the meaning of the acoustic signal. In this case, the output of the speech recognition subsystem serves as the input of the utterance understanding subsystem. Automatic speech recognition (APP systems) is one of the areas of natural language processing technologies.

Automatic speech recognition is used in automating the input of texts into computers, in the formation of oral queries to databases or information retrieval systems, in the formation of oral commands to various intelligent devices.

Basic concepts of speech recognition systems.

Speech recognition systems are characterized by many parameters.

One of the main parameters is the word recognition error (ORF). This parameter is the ratio of the number of unrecognized words to the total number of spoken words.

Other parameters characterizing automatic speech recognition systems are:

1) dictionary size,

2) speech mode,

3) style of speech,

4) subject area,

5) speaker addiction,

6) the level of acoustic noise,

7) the quality of the input channel.

Depending on the size of the dictionary, APP systems are divided into three groups:

With a small dictionary size (up to 100 words),

With an average dictionary size (from 100 words to several thousand words),

With a large dictionary size (more than 10,000 words).

Speech mode characterizes the way words and phrases are pronounced. There are systems for recognizing continuous speech and systems that allow recognizing only isolated words of speech. Isolated word recognition mode requires the speaker to pause briefly between words.

According to the style of speech, APP systems are divided into two groups: deterministic speech systems and spontaneous speech systems.

In deterministic speech recognition systems, the speaker reproduces speech following the grammatical rules of the language. Spontaneous speech is characterized by violations of grammatical rules and is more difficult to recognize.

Depending on the subject area, there are APP systems focused on application in highly specialized areas (for example, access to databases) and APP systems with an unlimited scope. The latter require a large amount of vocabulary and should provide recognition of spontaneous speech.

Many automatic speech recognition systems are speaker dependent. This involves pre-tuning the system to the peculiarities of the pronunciation of a particular speaker.

The complexity of solving the problem of speech recognition is explained by the high variability of acoustic signals. This variability is due to several reasons:

First, different implementation of phonemes - the basic units of the sound system of the language. The variability in the implementation of phonemes is caused by the influence of neighboring sounds in the speech stream. The shades of the realization of phonemes, due to the sound environment, are called allophones.

Secondly, the position and characteristics of acoustic receivers.

Thirdly, changes in the parameters of the speech of the same speaker, which are due to the different emotional state of the speaker, the pace of his speech.

The figure shows the main components of the speech recognition system:

The digitized speech signal enters the pre-processing unit, where the features necessary for sound recognition are extracted. Sound recognition is often done using artificial neural network models. The selected sound units are subsequently used to search for a sequence of words that best matches the input speech signal.

The search for a sequence of words is performed using acoustic, lexical and language models. The model parameters are determined from the training data based on the respective learning algorithms.

Synthesis of speech by text. Basic concepts

In many cases, the creation of artificial intelligence systems with elements of her-communication require the output of messages in speech form. The figure shows a block diagram of an intelligent question-answer system with a speech interface:

Picture 1.

Take a piece of lectures from Oleg

Consider the features of the empirical approach on the example of recognition of parts of speech. The task is to assign labels to the words of the sentence: noun, verb, preposition, adjective, and the like. In addition, it is necessary to define some additional features of nouns and verbs. For example, for a noun it is a number, and for a verb it is a form. We formalize the task.

Let's represent the sentence as a sequence of words: W=w1 w2…wn, where wn are random variables, each of which receives one of the possible values ​​belonging to the language dictionary. The sequence of labels assigned to the words of the sentence can be represented by the sequence X=x1 x2 … xn, where xn are random variables whose values ​​are defined on the set of possible labels.

Then the problem of part-of-speech recognition is to find the most probable sequence of labels x1, x2, …, xn given the sequence of words w1, w2, …, wn. In other words, it is necessary to find such a sequence of labels X*=x1 x2 … xn that provides the maximum conditional probability P(x1, x2, …, xn| w1 w2.. wn).

Let us rewrite the conditional probability P(X| W) as P(X| W)=P(X,W) / P(W). Since it is required to find the maximum conditional probability P(X,W) for the variable X, we get X*=arg x max P(X,W). The joint probability P(X,W) can be written as a product of conditional probabilities: P(X,W)=product over u-1 to n from P(x i |x1,…,x i -1 , w1,…,w i -1 ) P(w i |x1,…,x i -1 , w1,…,w i -1). Direct search for the maximum of this expression is a difficult task, since for large values ​​of n the search space becomes very large. Therefore, the probabilities that are written in this product are approximated by simpler conditional probabilities: P(x i |x i -1) P(w i |w i -1). In this case, it is assumed that the value of the label x i is associated only with the previous label x i -1 and does not depend on earlier labels, and that the probability of the word w i is determined only by the current label x i . These assumptions are called Markovian, and the theory of Markov models is used to solve the problem. Taking into account the Markov assumptions, we can write:

X*= arg x1, …, xn max П i =1 n P(x i |x i -1) P(wi|wi-1)

Where conditional probabilities are estimated on a set of training data

The search for a sequence of labels X* is carried out using the Viterbi dynamic programming algorithm. The Viterbi algorithm can be considered as a variant of the state graph search algorithm, where the vertices correspond to word labels.

Characteristically, for any current vertex, the set of child labels is always the same. Moreover, for each child vertex, the sets of parent vertices also coincide. This is explained by the fact that transitions are made on the state graph, taking into account all possible combinations of labels. Markov's assumption provides a significant simplification of the problem of recognition of parts of speech while maintaining high accuracy of assigning labels to words.

So, with 200 tags, the assignment accuracy is approximately 97%. For a long time, imperial analysis was performed using stochastic context-free grammars. However, they have a significant drawback. It lies in the fact that the same probabilities can be assigned to different parses. This is due to the fact that the probability of parsing is represented as a product of the probabilities of the rules involved in the parsing. If during the analysis different rules are used, characterized by the same probabilities, then this gives rise to the indicated problem. The best results are given by a grammar that takes into account the vocabulary of the language.

In this case, the rules include the necessary lexical information that provides different probability values ​​for the same rule in different lexical environments. Imperial parsing is more in line with pattern recognition than traditional parsing in its classical sense.

Comparative studies have shown that the accuracy of imperial parsing of natural language applications is higher than that of traditional parsing.

Methods of automatic pattern recognition and their implementation in optical character recognition systems (Optical Character Recognition - OCR systems) is one of the most advanced artificial intelligence technologies. In the development of this technology, Russian scientists occupy leading positions in the world.

An OCR system is understood as a system for automatic pattern recognition using special programs to image characters of printed or handwritten text (for example, entered into a computer through a scanner) and converting it into a format suitable for processing by word processors, text editors, etc.

The abbreviation OCR is sometimes deciphered as Optical Character Reader - a device for optical character recognition or automatic text reading. Currently, such devices in industrial use process up to 100,000 documents per day.

Industrial use involves the input of good to medium quality documents - this is the processing of census forms, tax returns, etc.

We list the features of the subject area that are significant from the point of view of OCR systems:

  • font and size variety of symbols;
  • distortions in the images of symbols (breaks in the images of symbols);
  • distortions during scanning;
  • foreign inclusions in images;
  • combination of text fragments in different languages;
  • a wide variety of character classes that can only be recognized with additional contextual information.

Automatic reading of printed and handwritten texts is a special case of automatic visual perception of complex images. Numerous studies have shown that in order to fully solve this problem, intellectual recognition, i.e., "recognition with understanding," is necessary.

There are three principles on which all OCR systems are based.

  • 1. The principle of the integrity of the image. In the object under study there are always significant parts between which there are relationships. The results of local operations with parts of the image are interpreted only jointly in the process of interpreting integral fragments and the entire image as a whole.
  • 2. The principle of purposefulness. Recognition is a purposeful process of generating and testing hypotheses (finding out what is expected of an object).
  • 3. The principle of adaptability. The recognition system must be capable of self-learning.

Leading Russian OCR systems: FineReader; FineReader Manuscript; formReader; CunieForm (Cognitive Technologies), Cognitive Forms (Cognitive Technologies) .

The FineReader system is produced by ABBYY, which was founded in 1989. ABBYY develops in two directions: machine vision and applied linguistics. The strategic direction of scientific research and development is the natural language aspect of technologies in the field of machine vision, artificial intelligence and applied linguistics.

CuneiForm GOLD for Windows is the world's first self-learning intelligent OCR system, using the latest adaptive text recognition technology, supports many languages. For each language, a dictionary is supplied for contextual checking and improving the quality of recognition results. Recognizes any polygraphic, typewritten typefaces and fonts received from printers, with the exception of decorative and handwritten, as well as very low-quality texts.

Characteristics of pattern recognition systems. Among OSL technologies, special technologies for solving certain classes of problems of automatic pattern recognition are of great importance:

  • search for people by photos;
  • search for mineral deposits and weather forecasting based on aerial photography and satellite images in various ranges of light radiation;
  • compiling geographic maps based on the initial information used in the previous task;
  • analysis of fingerprints and drawings of the iris in forensics, security and medical systems.

At the stage of preparation and processing of information, especially when computerizing an enterprise, automating accounting, the task arises of entering a large amount of textual and graphic information into a PC. The main devices for entering graphic information are: a scanner, a fax modem, and less often a digital camera. In addition, using optical text recognition programs, you can also enter (digitize) text information into a computer. Modern software and hardware systems make it possible to automate the input of large amounts of information into a computer, using, for example, a network scanner and parallel text recognition on several computers simultaneously.

Most OCR programs work with a bitmap that is received through a fax modem, scanner, digital camera, or other device. At the first stage, the OSA system must break the page into blocks of text, based on the features of the right and left alignment and the presence of several columns. The recognized block is then split into lines. Despite the apparent simplicity, this is not such an obvious task, since in practice the distortion of the page image or its fragments when folded is inevitable. Even a slight slant causes the left edge of one line to be lower than the right edge of the next, especially when the line spacing is small. As a result, there is a problem of determining the line to which this or that fragment of the image belongs. For example, for letters

The lines are then broken up into contiguous regions of the image that correspond to individual letters; the recognition algorithm makes assumptions about the correspondence of these areas to characters, and then each character is selected, as a result of which the page is restored in characters of text, and, as a rule, in a given format. OCR systems can achieve the best recognition accuracy - over 99.9% for pure images composed of ordinary fonts. At first glance, this recognition accuracy seems ideal, but the error rate is still depressing, because if there are approximately 1500 characters per page, then even with a recognition success rate of 99.9%, there are one or two errors per page. In such cases, you should use the dictionary check method, i.e. if a certain word is not in the system dictionary, then it will try to find a similar one according to special rules. But this still does not allow 100% of errors to be corrected and requires human control of the results.

Texts encountered in real life are usually far from perfect, and the percentage of recognition errors for "impure" texts is often unacceptably high. Dirty images are the most obvious problem because even small smudges can obscure defining parts of a character or transform one into another. The problem is also inaccurate scanning associated with the "human factor", as the operator sitting at the scanner is simply not able to smooth each scanned page and align it accurately with the edges of the scanner. If the document was photocopied, there are often breaks and merging of characters. Any of these effects can cause the system to err because some of the OSD systems assume that a contiguous area of ​​an image must be a single character. An out-of-bounds or skewed page creates slightly skewed character images that can be confused by the OSA system.

The OSL system software usually works with a large bitmap of the page received from the scanner. Images with a standard degree of resolution are achieved by scanning with an accuracy of 9600 p / d. An A4 sheet image at this resolution takes up about 1 MB of memory.

The main purpose of OCR systems is to analyze raster information (scanned character) and assign a corresponding character to an image fragment. After the recognition process is completed, OCR systems must be able to preserve the formatting of source documents, assign a paragraph attribute in the right place, save tables, graphics, etc. Modern recognition programs support all known text and graphic formats and spreadsheet formats, as well as HTML and PDF.

Working with OCR systems, as a rule, should not cause any particular difficulties. Most of these systems have the simplest automatic mode "scan and recognize" (Scan & Read), and they also support the mode of recognition of images from files. However, in order to achieve the best possible results for a given system, it is desirable (and often necessary) to manually pre-adjust it to a specific type of text, letterhead layout, and paper quality. An out-of-bounds or skewed page creates slightly distorted character images that can be confused by the OCR system.

When working with an OCR system, it is very important to choose the recognition language and the type of material to be recognized (typewriter, fax, dot matrix printer, newspaper, etc.), as well as the intuitiveness of the user interface. When recognizing texts in which several languages ​​are used, the recognition efficiency depends on the ability of the OCR system to form groups of languages. At the same time, some systems already have combinations for the most commonly used languages, such as Russian and English.

At the moment, there are a huge number of programs that support text recognition as one of the possibilities. The leader in this area is the FineReader system. The latest version of the program (6.0) now has tools for developing new systems based on FineReader 6.0 technology. The FineReader 6.0 family includes: FineReader 6.0 Professional, FineReader 6.0 Corporate Edition, FineReader Scripting Edition 6.0 and FineReader Engine 6.0. The FineReader 6.0 system, in addition to knowing a huge number of formats for saving, including PDF, has the ability to directly recognize from PDF files. The new Intelligent Background Filtering technology (intelligent background filtering) allows you to filter out information about the texture of the document and the background noise of the image: sometimes a gray or colored background is used to highlight text in a document. This does not prevent a person from reading, but conventional text recognition algorithms have serious difficulties when working with letters located on top of such a background. FineReader can detect zones containing such text by separating the text from the background of the document, finding dots that are smaller than a certain size, and removing them. At the same time, the contours of the letters are preserved, so that background points that are close to these contours do not introduce interference that can degrade the quality of text recognition.

Using the capabilities of modern layout programs, designers often create objects of complex shape, such as wrapping multi-column text around a non-rectangular image. FineReader 6.0 supports the recognition of such objects and their saving in MS Word files. Now complex layout documents will be accurately reproduced in this text editor. Even tables are recognized with maximum accuracy, while maintaining all the possibilities for editing.

ABBYY FormReader is one of ABBYY's recognition programs based on ABBYY FineReader Engine. This program is designed to recognize and process forms that can be filled in manually. ABBYY FormReader can process forms with a fixed layout just as well as forms whose structure can change. The new ABBYY FlexiForm technology was used for recognition.

Leading software manufacturers have licensed Russian information technology for use with their products. The popular software packages Corel Draw (Corel Corporation), FaxLine/OCR & Business Card Wizard (Inzer Corporation) and many others have the CuneiForm OCR library built in. This program became the first OCR system in Russia to receive the MS Windows Compatible Logo.

Readiris Pro 7 is a professional text recognition program. According to the manufacturers, this OCR system differs from analogues in the highest accuracy of converting ordinary (everyday) printed documents, such as letters, faxes, magazine articles, newspaper clippings, into editable objects (including PDF files). The main advantages of the program are: the ability to more or less accurately recognize images compressed “to the maximum” (with maximum loss of quality) using the JPEG format method, support for digital cameras and auto-detection of page orientation, support for up to 92 languages ​​(including Russian).

OmniPage 11 is a ScanSoft product. A limited version of this program (OmniPage 11 Limited Edition, OmniPage Lite) is usually bundled with new scanners (in Europe and the US). The developers claim that their program recognizes printed documents with almost 100% accuracy, restoring their formatting, including columns, tables, hyphenation (including hyphenation of parts of words), headings, chapter titles, signatures, page numbers, footnotes, paragraphs, numbered lists, red lines, graphs and pictures. It is possible to save to Microsoft Office, PDF and 20 other formats, recognize from PDF files and edit in this format. The artificial intelligence system allows you to automatically detect and correct errors after the first manual correction. A new specially developed software module "Dcspeckle" allows you to recognize documents with reduced quality (faxes, copies, copies of copies, etc.). The advantage of the program is the ability to recognize colored text and correct by voice. A version of OmniPage also exists for Macintosh computers.

  • Cm.: Bashmakov A. I., Bashmakov I. A. Intelligent information technologies.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Ministry of Education and Science of the Russian Federation

Novosibirsk State University of Economics and Management "NINH"

Faculty of Information Technology

Department of Applied Information Technologies

discipline Fuzzy logic and neural networks

Pattern recognition

Direction: Business informatics (electronic business)

Full name of the student: Ekaterina Vitalievna Mazur

Checked by: Pavlova Anna Illarionovna

Novosibirsk 2016

  • Introduction
  • 1. The concept of recognition
    • 1.1 Development history
    • 1.2 Classification of pattern recognition methods
  • 2. Pattern recognition methods
  • 3. General characteristics of pattern recognition problems and their types
  • 4. Problems and prospects for the development of pattern recognition
    • 4.1 Application of pattern recognition in practice
  • Conclusion

Introduction

For quite a long time, the problem of pattern recognition was considered only from a biological point of view. At the same time, only qualitative characteristics were subjected to observations, which did not allow describing the functioning mechanism.

The concept introduced by N. Wiener at the beginning of the 20th century cybernetics(the science of the general laws of the processes of control and transmission of information in machines, living organisms and society), allowed the introduction of quantitative methods in matters of recognition. That is, to present this process (in fact - a natural phenomenon) by mathematical methods.

The theory of pattern recognition is one of the main sections of cybernetics, both in theoretical and applied terms. Thus, the automation of some processes involves the creation of devices capable of responding to changing characteristics of the external environment with a certain number of positive reactions.

The basis for solving problems of this level are the results of the classical theory of statistical solutions. Within its framework, algorithms for determining the class to which a recognizable object can be assigned were built.

The purpose of this work is to get acquainted with the concepts of pattern recognition theory: to reveal the main definitions, to study the history of occurrence, to highlight the main methods and principles of the theory.

The relevance of the topic lies in the fact that at the moment pattern recognition is one of the leading areas of cybernetics. So, in recent years, it has been increasingly used: it simplifies the interaction of a person with a computer and creates the prerequisites for the use of various artificial intelligence systems.

image recognition application

1. The concept of recognition

For a long time, the problem of recognition attracted the attention of only scientists in the field of applied mathematics. As a result, the works of R. Fischer, created in 20s, led to the formation of discriminant analysis - one of the sections of the theory and practice of pattern recognition. AT 40s A. N. Kolmogorov and A. Ya. Khinchin set the goal of separating a mixture of two distributions. And in 50-60s years of the twentieth century, on the basis of a large number of works, the theory of statistical decisions appeared. Within the framework of cybernetics, a new direction began to take shape, associated with the development of theoretical foundations and the practical implementation of mechanisms, as well as systems designed to recognize objects and processes. The new discipline was called "Pattern Recognition".

Pattern recognition(objects) is the task of identifying an object by its image (optical recognition), audio recording (acoustic recognition) or other characteristics. Image- This is a classification grouping that allows you to combine a group of objects according to some criteria. Images have a characteristic feature that manifests itself in the fact that acquaintance with a finite number of phenomena from one set makes it possible to recognize a large number of its representatives. In the classical formulation of the recognition problem, the set is divided into parts.

One of the basic definitions is also the concept sets. In a computer, a set is a set of non-repeating elements of the same type. "Non-repeating" means that an element is either present in the set or not. The universal set includes all possible elements, the empty set contains none.

The method of assigning an element to some image is called decision rule. Another important concept is metrics- determines the distance between the elements of the set. The smaller this distance, the more similar the objects (symbols, sounds, etc.) that we recognize. By default, the elements are specified as a set of numbers, and the metric is specified as some kind of function. The efficiency of the program depends on the choice of image representation and implementation of the metric: the same recognition algorithm with different metrics will make mistakes with different frequencies.

learning usually called the process of developing in some system a particular reaction to factors of external similar signals by their repeated impact on the system. self-learning differs from training in that here additional information about the reaction is not reported to the system.

Examples of pattern recognition problems are:

Letter recognition;

Barcode recognition;

Recognition of license plates;

Recognition of faces and other biometric data;

Speech recognition, etc.

1.1 Story development

By the mid-1950s, R. Penrose questioned the neural network model of the brain, pointing out the essential role of quantum mechanical effects in its functioning. Based on this, F. Rosenblatt developed a visual pattern recognition learning model called the perceptron.

Picture1 - Schematic of the Perceptron

Further, various generalizations of the perceptron were invented, and the function of neurons was complicated: neurons could not only multiply input numbers and compare the result with threshold values, but also apply more complex functions to them. Figure 2 shows one of these complications:

Rice. 2 Diagram of the neural network.

In addition, the topology of the neural network could be even more complicated. For example, like this:

Figure 3 - Diagram of Rosenblatt's neural network.

Neural networks, being a complex object for mathematical analysis, with their proper use, made it possible to find very simple data laws. But this advantage is also a source of potential errors. The difficulty for analysis, in the general case, is explained only by the complex structure, but, as a result, by the practically inexhaustible possibilities for generalizing a wide variety of regularities.

1.2 Classificationmethodsrecognitionimages

As we have already noted, pattern recognition is the task of establishing equivalence relations between certain images-models of objects in the real or ideal world.

These relations determine the belonging of recognizable objects to some classes, which are considered as independent independent units.

When constructing recognition algorithms, these classes can be specified by a researcher who uses his own ideas or uses additional information about the similarity or difference of objects in the context of a given task. In this case, one speaks of "recognition with the teacher." In another, i.e. when an automated system solves a classification problem without involving additional information, one speaks of "unsupervised recognition".

In the works of V.A. Duke gives an academic review of recognition methods and uses two main ways of representing knowledge:

Intensional (in the form of a diagram of relationships between attributes);

Extensional with the help of specific facts (objects, examples).

The intensional representation captures the patterns that explain the structure of the data. With regard to diagnostic tasks, such fixation consists in determining operations on the features of objects that lead to the desired result. Intensional representations are implemented through operations on values ​​and do not involve operations on specific objects.

In turn, extensional representations of knowledge are associated with the description and fixation of specific objects from the subject area and are implemented in operations, the elements of which are objects as independent systems.

Thus, the classification of recognition methods proposed by V.A. Duke, fundamental regularities are laid down that underlie the human way of cognition in principle. This puts this division into classes in a special position compared to other less well-known classifications, which, against this background, look artificial and incomplete.

2. Methodspattern recognition

Iteration method. In this method, a comparison is made with a certain database, where for each of the objects there are different options for modifying the display. For example, for optical image recognition, you can apply the iteration method at different angles or scales, offsets, deformations, etc. For letters, you can iterate over the font or its properties. In the case of sound pattern recognition, there is a comparison with some known patterns (a word spoken by many people). Further, a deeper analysis of the characteristics of the image is performed. In the case of optical recognition, this may be the definition of geometric characteristics. The sound sample in this case is subjected to frequency and amplitude analysis.

The next method is use of artificial neural networks(INS). It requires either a huge number of examples of the recognition task, or a special neural network structure that takes into account the specifics of this task. But, nevertheless, this method is characterized by high efficiency and productivity.

Methods based on estimates of the distribution densities of feature values. Borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to the initial probabilities of objects belonging to a particular class and conditional feature distribution densities.

The group of methods based on the estimation of the distribution densities of feature values ​​is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed parametric methods in modern statistics, for which the analytical expression of the distribution law (the normal law) is considered to be known and only a small number of parameters (mean vectors and covariance matrices) need to be estimated. The main difficulties in applying this method are considered to be the need to remember the entire training set to calculate density estimates and high sensitivity to the training set.

Methods based on assumptions about the class of decision functions. In this group, the type of the decision function is considered to be known and its quality functional is given. Based on this functional, the optimal approximation to the decision function is found from the training sequence. The decision rule quality functional is usually associated with an error. The main advantage of the method is the clarity of the mathematical formulation of the recognition problem. The possibility of extracting new knowledge about the nature of an object, in particular knowledge about the mechanisms of interaction of attributes, is fundamentally limited here by a given structure of interaction, fixed in the chosen form of decision functions.

Prototype comparison method. This is the easiest extensional recognition method in practice. It applies when the recognizable classes are shown as compact geometric classes. Then the center of the geometric grouping (or the object closest to the center) is chosen as the prototype point.

To classify an indefinite object, the nearest prototype is found, and the object belongs to the same class as it. Obviously, no generalized images are formed in this method. Various types of distances can be used as a measure.

k nearest neighbor method. The method lies in the fact that when classifying an unknown object, a given number (k) of geometrically nearest feature space of other nearest neighbors with already known belonging to a class is found. The decision to assign an unknown object is made by analyzing information about its nearest neighbors. The need to reduce the number of objects in the training sample (diagnostic precedents) is a disadvantage of this method, since this reduces the representativeness of the training sample.

Based on the fact that different recognition algorithms behave differently on the same sample, the question arises of a synthetic decision rule that would use the strengths of all algorithms. For this, there is a synthetic method or sets of decision rules that combine the most positive aspects of each of the methods.

In conclusion of the review of recognition methods, we present the essence of the above in a summary table, adding some other methods used in practice.

Table 1. Classification table of recognition methods, comparison of their areas of application and limitations

Classification of recognition methods

Application area

Limitations (disadvantages)

Intensive recognition methods

Methods based on density estimates

Problems with a known distribution (normal), the need to collect large statistics

The need to enumerate the entire training set during recognition, high sensitivity to non-representativeness of the training set and artifacts

Assumption based methods

Classes should be well separable

The form of the decision function must be known in advance. The impossibility of taking into account new knowledge about correlations between features

Boolean Methods

Problems of small dimension

When selecting logical decision rules, a complete enumeration is necessary. High labor intensity

Linguistic Methods

The task of determining the grammar for a certain set of statements (descriptions of objects) is difficult to formalize. Unresolved theoretical problems

Extensional methods of recognition

Prototype comparison method

Problems of small dimension of feature space

High dependence of classification results on the metric. Unknown optimal metric

k nearest neighbor method

High dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. Computational complexity

Grade Calculation Algorithms (ABO)

Problems of small dimension in terms of the number of classes and features

Dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. High technical complexity of the method

Collective decision rules (CRC) is a synthetic method.

Problems of small dimension in terms of the number of classes and features

Very high technical complexity of the method, the unresolved number of theoretical problems, both in determining the areas of competence of particular methods, and in the particular methods themselves

3. General characteristics of pattern recognition problems and their types

The general structure of the recognition system and its stages are shown in Figure 4:

Figure 4 - The structure of the recognition system

Recognition tasks have the following characteristic stages:

Transformation of initial data to a convenient form for recognition;

Recognition (indicating that an object belongs to a certain class).

In these problems, one can introduce the concept of similarity of objects and formulate a set of rules based on which an object is assigned to one or different classes.

It is also possible to operate with a set of examples, the classification of which is known and which, in the form of given descriptions, can be declared to the recognition algorithm to be adjusted to the task in the learning process.

Difficulties in solving recognition problems are associated with the inability to apply classical mathematical methods without corrections (often there is no information available for an accurate mathematical model)

There are the following types of recognition tasks:

The task of recognition is the assignment of the presented object according to its description to one of the given classes (training with a teacher);

The task of automatic classification is to split the set into a system of non-overlapping classes (taxonomy, cluster analysis, self-learning);

The problem of choosing an informative set of attributes in recognition;

The task of bringing the initial data to a convenient form;

Dynamic recognition and classification;

The task of forecasting - that is, the decision must refer to a certain moment in the future.

There are two most difficult problems in existing recognition systems:

The problem of "1001 classes" - adding 1 class to 1000 existing ones causes difficulties in retraining the system and checking the data obtained before;

The problem of "correlation of vocabulary and sources" is most strongly manifested in speech recognition. Current systems can recognize either a large number of words from a small group of individuals, or few words from a large group of individuals. It is also difficult to recognize a large number of faces with makeup or grimaces.

Neural networks do not solve these problems directly, however, due to their nature, they adapt much more easily to changes in input sequences.

4. Problems and prospectsdevelopmentpattern recognition

4.1 Application of pattern recognition in practice

In general, the pattern recognition problem consists of two parts: learning and recognition. Learning is carried out by showing independent objects with their assignment to one or another class. As a result of training, the recognition system must acquire the ability to respond with the same reactions to all objects of one image and different ones to all others. It is important that in the learning process only the objects themselves and their belonging to the image are indicated. Training is followed by a recognition process that characterizes the actions of an already trained system. Automation of these procedures is the problem.

Before starting the analysis of any object, it is necessary to obtain certain, in some way ordered, accurate information about it. Such information is a set of properties of objects, their display on the set of perceiving organs of the recognizing system.

But each object of observation can act differently, depending on the conditions of perception. In addition, objects of the same image can be very different from each other.

Each mapping of any object to the perceiving organs of the recognizing system, regardless of its position relative to these organs, is usually called an image of the object, and sets of such images, united by some common properties, are images. With a successful choice of the initial description (feature space), the recognition task can turn out to be quite easy, and, conversely, an unsuccessfully chosen one can lead to very difficult further processing of information, or even to the absence of a solution.

Recognition of objects, signals, situations, phenomena is the most common task that a person needs to solve every second. For this, huge brain resources are used, which is estimated by such an indicator as the number of neurons, equal to 10 10 .

Also, recognition is constantly encountered in technology. Calculations in networks of formal neurons are in many ways reminiscent of information processing by the brain. In the last decade, neurocomputing has become extremely popular and has managed to turn into an engineering discipline associated with the production of commercial products. A large amount of work is underway to create an element base for neurocomputing.

Their main characteristic feature is the ability to solve non-formalized problems for which, for one reason or another, no solution algorithms are supposed. Neurocomputers offer a relatively simple technology for obtaining algorithms through training. This is their main advantage. Therefore, neurocomputing is relevant right now - in the heyday of multimedia, when global development requires the development of new technologies closely related to pattern recognition.

One of the main problems in the development and application of artificial intelligence remains the problem of recognizing sound and visual images. All other technologies are already ready to find their application in medicine, biology, and security systems. In medicine, pattern recognition helps doctors make more accurate diagnoses; in factories, it is used to predict defects in batches of goods. Biometric identification systems, as their algorithmic core, are also based on recognition results. Further development and design of computers capable of more direct communication with a person in natural languages ​​​​for people and through speech is unsolvable without recognition. Here the question of the development of robotics, artificial control systems, containing recognition systems as vital subsystems, already arises.

Conclusion

As a result of the work, a brief overview of the main definitions of the concepts of such a section of cybernetics as pattern recognition was made, recognition methods were identified, and tasks were formulated.

Of course, there are many directions for the development of this science. In addition, as was formulated in one of the chapters, recognition is one of the key areas of development at the moment. Thus, software in the coming decades can become even more attractive to the user and competitive in the modern market if it acquires a commercial format and begins to be distributed within a large number of consumers.

Further research can be directed to the following aspects: in-depth analysis of the main processing methods and the development of new combined or modified methods for recognition. Based on the conducted research, it will be possible to develop a functional recognition system, with which it is possible to test the selected recognition methods for effectiveness.

Bibliography

1. David Formais, Jean Pons Computer vision. Modern approach, 2004

2. Aizerman M.A., Braverman E.M., Rozonoer L.I. Method of potential functions in the theory of machine learning. - M.: Nauka, 2004.

3. Zhuravlev Yu.I. On an algebraic approach to solving problems of recognition or classification // Problems of Cybernetics. M.: Nauka, 2005. - Issue. 33.

4. Mazurov V.D. Committees of Systems of Inequalities and the Problem of Recognition // Cybernetics, 2004, no. 2.

5. Potapov A.S. Pattern recognition and machine perception. - St. Petersburg: Polytechnic, 2007.

6. Minsky M., Papert S. Perceptrons. - M.: Mir, 2007.

7. Rastrigin L. A., Erenshtein R. Kh. Method of collective recognition. M. Energoizdat, 2006.

8. Rudakov K.V. On the algebraic theory of universal and local constraints for classification problems // Recognition, classification, forecast. Mathematical methods and their application. Issue. 1. - M.: Nauka, 2007.

9. Fu K. Structural methods in pattern recognition. - M.: Mir, 2005.

Hosted on Allbest.ru

...

Similar Documents

    Basic concepts of pattern recognition theory and its significance. The essence of the mathematical theory of pattern recognition. The main tasks arising in the development of pattern recognition systems. Classification of real-time pattern recognition systems.

    term paper, added 01/15/2014

    The concept and features of the construction of pattern recognition algorithms. Different approaches to the typology of recognition methods. The study of the main ways of representing knowledge. Characterization of intensional and extensional methods, evaluation of their quality.

    presentation, added 01/06/2014

    Theoretical foundations of pattern recognition. Functional diagram of the recognition system. Application of Bayesian methods in solving the problem of pattern recognition. Bayesian image segmentation. TAN model for solving the problem of image classification.

    thesis, added 10/13/2017

    Review of tasks arising in the development of pattern recognition systems. Trainable image classifiers. Perceptron algorithm and its modifications. Creation of a program designed to classify images using the least mean square error method.

    term paper, added 04/05/2015

    Pattern recognition methods (classifiers): Bayesian, linear, method of potential functions. Development of a program for recognizing a person by his photographs. Examples of the work of classifiers, experimental results on the accuracy of the methods.

    term paper, added 08/15/2011

    Creation of a software tool that performs visual image recognition based on artificial neural networks. Methods used for pattern recognition. Pandemonium Selfridge. Perceptron Rosenblatt. Chain code formation rule.

    thesis, added 04/06/2014

    Pattern recognition is the task of identifying an object or determining its properties from its image or audio recording. History of theoretical and technical developments in the field. Methods and principles used in computing for recognition.

    abstract, added 04/10/2010

    Concept of pattern recognition system. Classification of recognition systems. Development of a system for recognizing the shape of micro-objects. Algorithm for creating a system for recognizing micro-objects on a crystallogram, features of its implementation in a software environment.

    term paper, added 06/21/2014

    Choosing the type and structure of the neural network. Selection of a recognition method, a block diagram of the Hopfield network. Training the pattern recognition system. Features of working with the program, its advantages and disadvantages. Description of the user interface and screen forms.

    term paper, added 11/14/2013

    Emergence of technical systems of automatic recognition. Man as an element or link of complex automatic systems. Possibilities of automatic recognition devices. Stages of creating an image recognition system. Measurement and coding processes.