Overview of distributed systems. Distributed System Architecture Large Scale Cloud IoT Platform

According to the well-known expert in the field of computer science E. Tanenbaum, there is no generally accepted and at the same time strict definition of a distributed system. Some wits argue that distributed is such computing system, in which a malfunction of a computer, the existence of which users did not even suspect before, leads to the termination of all their work. A significant part of distributed computing systems, unfortunately, satisfy this definition, but formally it refers only to systems with a unique point of vulnerability ( single point of failure).

Often, when defining a distributed system, the focus is on the division of its functions among several computers. With this approach, any is distributed computing system where data processing is split between two or more computers. Based on the definition of E. Tanenbaum, a somewhat more narrowly distributed system can be defined as a set of independent computers connected by communication channels, which, from the point of view of a user of some software, look like a single whole.

This approach to defining a distributed system has its drawbacks. For example, everything used in such a distributed system software could work on a single computer, but from the point of view of the above definition, such a system will no longer be distributed. Therefore, the concept of a distributed system should probably be based on the analysis of the software that forms such a system.

As a basis for describing the interaction of two entities, consider the general model of client-server interaction, in which one of the parties (the client) initiates the exchange of data by sending a request to the other party (the server). The server processes the request and, if necessary, sends a response to the client (Fig. 1.1).

Rice. 1.1.

Interaction within the framework of the client-server model can be either synchronous, when the client is waiting for the server to process its request, or asynchronous, in which the client sends a request to the server and continues its execution without waiting for the server's response. The client and server model can be used as a basis for describing various interactions. For this course, the interaction of the constituent parts of the software that forms a distributed system is important.

Rice. 1.2.

Consider a certain typical application, which, in accordance with modern concepts, can be divided into the following logical levels (Fig. 1.2): user interface(PI), application logic (LP) and data access (DD), working with the database (DB). The system user interacts with it through the user interface, the database stores data describing the application domain, and the application logic layer implements all algorithms related to subject area.

Since, in practice, different users of the system are usually interested in accessing the same data, the simplest separation of the functions of such a system between several computers will be the separation of the logical layers of the application between one server part of the application, which is responsible for accessing the data, and the client parts located on several computers. implementing the user interface. Application logic can be assigned to the server, clients, or shared between them (Figure 1.3).

Rice. 1.3.

The architecture of applications built on this principle is called client-server or two-tier. In practice, such systems are often not classified as distributed, but formally they can be considered the simplest representatives of distributed systems.

The development of the client-server architecture is a three-tier architecture, in which the user interface, application logic and data access are separated into independent components of the system that can run on independent computers (Fig. 1.4).

Rice. 1.4.

The user's request in such systems is sequentially processed by the client part of the system, the application logic server and the database server. However, a distributed system is usually understood as a system with a more complex architecture than a three-tier one.

In the previous chapter, we looked at tightly coupled multiprocessor systems with shared memory, shared kernel data structures, and a shared pool from which processes are invoked. Often, however, it is desirable to allocate processors in such a way that they are autonomous from the operating environment and operating conditions for resource sharing purposes. Suppose, for example, a user of a personal computer needs to access files located on a larger machine, but at the same time retain control over the personal computer. Although some programs, such as uucp, support network file transfer and other network functions, their use will not be hidden from the user, since the user is aware that he is using the network. In addition, it should be noted that programs like text editors do not work with deleted files, as with ordinary ones. Users should have the standard set of UNIX system functions and, aside from the potential performance bottleneck, should not feel the crossing of machine boundaries. So, for example, the work of the system functions open and read with files on remote machines should not differ from their work with files belonging to local systems.

The distributed system architecture is shown in Figure 13.1. Each computer shown in the figure is a self-contained unit consisting of a CPU, memory, and peripherals. The model does not break even though the computer does not have a local file system: it must have peripheral devices to communicate with other machines, and all files belonging to it can be located on another computer. The physical memory available to each machine is independent of the processes running on other machines. In this respect, distributed systems differ from the tightly coupled multiprocessor systems discussed in the previous chapter. Accordingly, the core of the system on each machine functions independently of the external operating conditions of the distributed environment.

Figure 13.1. Distributed architecture system model

Distributed systems, well described in the literature, traditionally fall into the following categories:

Peripheral systems, which are groups of machines that have a strong commonality and are associated with one (usually larger) machine. Peripheral processors share their load with the central processor and forward all calls to the operating system to it. The goal of a peripheral system is to increase overall network performance and to provide the ability to allocate a processor to a single process in a UNIX operating environment. The system starts up as a separate module; Unlike other models of distributed systems, peripheral systems do not have real autonomy, except in cases related to process dispatching and local memory allocation.

Distributed systems such as "Newcastle", allowing remote communication by the names of remote files in the library (the name is taken from the article "The Newcastle Connection" - see). Deleted files have a BOM (distinguished name) that, in the search path, contains special characters or an optional name component that precedes the file system root. The implementation of this method does not involve making changes to the system kernel, and therefore it is simpler than the other methods discussed in this chapter, but less flexible.

Distributed systems are completely transparent, in which standard distinguished names are sufficient to refer to files located on other machines; it is up to the kernel to recognize these files as deleted. File search paths specified in their composite names cross machine boundaries at mount points, no matter how many such points are formed when filesystems are mounted on disks.

In this chapter, we will look at the architecture of each model; all information provided is not based on the results of specific developments, but on information published in various technical articles. This assumes that protocol modules and device drivers are responsible for addressing, routing, flow control, and error detection and correction — in other words, that each model is independent of the network being used. The examples of using system functions shown in the next section for peripheral systems work in a similar way for systems like Newcastle and for completely transparent systems, which will be discussed later; therefore, we will consider them in detail once, and in the sections devoted to other types of systems, we will focus mainly on the features that distinguish these models from all others.

13.1 PERIPHERAL PROCESSORS

The architecture of the peripheral system is shown in Figure 13.2. The goal of this configuration is to improve overall network performance by reallocating running processes between the CPU and peripheral processors. Each of the peripheral processors does not have any other local peripheral devices at its disposal other than those it needs to communicate with the central processing unit. The file system and all devices are at the disposal of the central processor. Suppose that all user processes are executed on the peripheral processor and do not move between peripheral processors; once transferred to the processor, they remain on it until completion. The peripheral processor contains a light version of the operating system, designed to handle local calls to the system, interrupt management, memory allocation, work with network protocols and with a device driver for communication with the central processor.

When the system is initialized on the central processor, the core loads the local operating system on each of the peripheral processors via communication lines. Any process running on the periphery is associated with a satellite process belonging to the central processor (see); when a process running on a peripheral processor calls a system function that requires the services of the central processor only, the peripheral process communicates with its satellite and the request is sent to the central processor for processing. The satellite process performs a system function and sends the results back to the peripheral processor. The relationship between a peripheral process and its satellite is similar to the client-server relationship we discussed in detail in Chapter 11: the peripheral process acts as a client of its satellite, which supports the functions of working with the file system. In this case, the remote server process has only one client. In section 13.4 we will look at server processes with multiple clients.

Figure 13.2. Peripheral system configuration

Figure 13.3. Message formats

When a peripheral process calls a system function that can be processed locally, the kernel does not need to send a request to the satellite process. So, for example, in order to obtain additional memory, a process can call the sbrk function for local execution. However, if the services of the central processor are required, for example, to open a file, the kernel encodes information about the parameters passed to the called function and the process execution conditions into a message sent to the satellite process (Figure 13.3). The message includes a sign from which it follows that the system function is performed by the satellite process on behalf of the client, parameters passed to the function and data about the process execution environment (for example, user and group identification codes), which are different for different functions. The remainder of the message is variable-length data (for example, a compound file name or data to be written with the write function).

The satellite process waits for requests from the peripheral process; when a request is received, it decodes the message, determines the type of system function, executes it, and converts the results into a response sent to the peripheral process. The response, in addition to the results of the system function execution, includes the error message (if any), the signal number, and a variable-length data array containing, for example, information read from a file. The peripheral process is suspended until a response is received, after receiving it, it decrypts and transmits the results to the user. This is the general scheme for handling calls to the operating system; now let's move on to a more detailed consideration of individual functions.

To explain how the peripheral system works, consider a number of functions: getppid, open, write, fork, exit, and signal. The getppid function is pretty straightforward as it deals with simple request and response forms that are exchanged between the peripheral and the CPU. The core on the peripheral processor generates a message that has a sign, from which it follows that the requested function is the getppid function, and sends the request to the central processor. The satellite process on the central processor reads the message from the peripheral processor, decrypts the type of system function, executes it, and obtains the identifier of its parent. It then generates a response and passes it on to a pending peripheral process at the other end of the communication line. When the peripheral processor receives a response, it passes it to the process that called the getppid system function. If the peripheral process stores data (such as the process ID of the parent) in local memory, it does not have to communicate with its companion at all.

If the open system function is called, the peripheral process sends a message to its companion, which includes the file name and other parameters. If successful, the companion process allocates an index and entry point to the file table, allocates an entry in the user file descriptor table in its space, and returns the file descriptor to the peripheral process. All this time, at the other end of the communication line, the peripheral process is waiting for a response. He does not have any structures at his disposal that would store information about the file being opened; The descriptor returned by open is a pointer to an entry in the companion process in the user file descriptor table. The results of executing the function are shown in Figure 13.4.

Figure 13.4. Calling the open function from a peripheral process

If a call to the system function write is made, the peripheral processor generates a message consisting of a sign of the write function, a file descriptor and the amount of data to be written. Then, from the space of the peripheral process, it copies the data to the satellite process through the communication line. The satellite process decrypts the received message, reads the data from the communication line and writes them to the corresponding file (the descriptor contained in the message is used as a pointer to the index of which and the record about which in the file table is used); all of these actions are performed on the central processor. At the end of the work, the satellite process sends to the peripheral process a message that confirms the receipt of the message and contains the number of bytes of data that have been successfully copied to the file. The read operation is similar; the satellite informs the peripheral process about the number of actually read bytes (in the case of reading data from a terminal or from a channel, this number does not always coincide with the amount specified in the request). To perform either one or the other function, it may be necessary to send information messages multiple times over the network, which is determined by the amount of data sent and the size of the network packets.

The only function that needs to be changed while running on the CPU is the fork system function. When a process executes this function on the CPU, the kernel selects a peripheral processor for it and sends a message to a special process - the server, informing the latter that it is going to start unloading the current process. Assuming the server has accepted the request, the kernel uses fork to create a new peripheral process, allocating a process table entry and address space. The central processor unloads a copy of the process that called the fork function to the peripheral processor, overwriting the newly allocated address space, spawns a local satellite to communicate with the new peripheral process and sends a message to the peripheral to initialize the program counter for the new process. The satellite process (on the CPU) is a descendant of the process that called fork; a peripheral process is technically a descendant of the server process, but logically it is a descendant of the process that called the fork function. The server process has no logical connection with the child when fork completes; the server's only job is to help unload the child. Due to the strong connection between the system components (peripheral processors do not have autonomy), the peripheral process and the satellite process have the same identification code. The relationship between processes is shown in Figure 13.5: the continuous line shows the parent-child relationship, and the dotted line shows the relationship between peers.

Figure 13.5. Executing a fork function on the CPU

When a process executes the fork function on the peripheral processor, it sends a message to its satellite on the CPU, which then executes the entire sequence of actions described above. The satellite selects a new peripheral processor and makes the necessary preparations for unloading the image of the old process: it sends a request to the parent peripheral process to read its image, in response to which the transfer of the requested data begins at the other end of the communication channel. The satellite reads the transmitted image and overwrites it to the peripheral descendant. When the image unloading is finished, the satellite process fork, creating its child on the CPU, and passes the value of the program counter to the peripheral child so that the latter knows from which address to start execution. Obviously, it would be better if the child of the companion process was assigned to the peripheral child as a parent, but in our case, the generated processes are able to run on other peripheral processors, not just the one on which they are created. The relationship between processes at the end of the fork function is shown in Figure 13.6. When the peripheral process finishes its work, it sends a corresponding message to the satellite process, and that also ends. A companion process cannot initiate a shutdown.

Figure 13.6. Executing a fork function on a peripheral processor

In both multiprocessor and uniprocessor systems, the process must respond to signals in the same way: the process either completes the execution of the system function before checking the signals, or, on the contrary, upon receiving the signal, immediately exits the suspended state and abruptly interrupts the work of the system function, if this is consistent with the priority. with which he was suspended. Since the satellite process performs system functions on behalf of the peripheral process, it must respond to signals in coordination with the latter. If, on a uniprocessor system, a signal causes a process to abort the function, the companion process on a multiprocessor system should behave in the same way. The same can be said about the case when the signal prompts the process to terminate its work using the exit function: the peripheral process terminates and sends the corresponding message to the satellite process, which, of course, also terminates.

When a peripheral process calls the signal system function, it stores the current information in local tables and sends a message to its satellite informing it whether the specified signal should be received or ignored. The satellite process does not care whether it intercepts the signal or the default action. The reaction of a process to a signal depends on three factors (Figure 13.7): whether a signal is received while the process is executing a system function, whether an indication is made using the signal function to ignore the signal, whether the signal occurs on the same peripheral processor or on some other. Let's move on to considering the various possibilities.

sighandle algorithm / * signal processing algorithm * /

if (the current process is someone's companion or has a prototype)

if (signal is ignored)

if (the signal came during the execution of a system function)

put a signal in front of the satellite process;

send a signal message to a peripheral process;

else (/ * peripheral process * /

/ * whether a signal was received during the execution of a system function or not * /

send a signal to the satellite process;

algorithm satellite_end_of_syscall / * termination of a system function called by a peripheral process * /

input information: absent

imprint: none

if (an interrupt was received during the execution of a system function)

send interrupt message, signal to peripheral process;

else / * the execution of the system function was not interrupted * /

send reply: enable the flag showing the arrival of the signal;

Figure 13.7. Signal processing in the peripheral system

Suppose that a peripheral process has suspended its work while the satellite process performs a system function on its behalf. If the signal occurs elsewhere, the satellite process detects it earlier than the peripheral process. Three cases are possible.

1. If, while waiting for some event, the satellite process did not enter the suspended state, from which it would exit upon receiving a signal, it performs the system function to the end, sends the results of execution to the peripheral process and shows which of the signals it received.

2. If the process instructed to ignore this type of signal, the satellite continues to follow the system function execution algorithm without exiting the suspended state by longjmp. In the response sent to the peripheral process, there will be no signal received message.

3. If, upon receiving a signal, the satellite process interrupts the execution of a system function (by longjmp), it informs the peripheral process about this and informs it of the signal number.

The peripheral process looks for information about the receipt of signals in the received response and, if any, processes the signals before exiting the system function. Thus, the behavior of a process in a multiprocessor system exactly corresponds to its behavior in a uniprocessor system: it either exits without exiting the kernel mode, or calls a custom signal processing function, or ignores the signal and successfully completes the system function.

Figure 13.8. Interrupt during execution of a system function

Suppose, for example, that a peripheral process calls a read function from a terminal connected to the central processor and pauses its work while the satellite process performs the function (Figure 13.8). If the user presses the break key, the CPU core sends a signal to the satellite process. If the satellite was in a suspended state, waiting for a portion of data from the terminal, it immediately exits this state and terminates the read function. In its response to a request from a peripheral process, the satellite provides an error code and signal number corresponding to the interrupt. The peripheral process analyzes the response and, since the message says an interrupt has arrived, sends the signal to itself. Before exiting the read function, the peripheral core checks for signaling, detects an interrupt signal from the satellite process, and processes it as usual. If, as a result of receiving an interrupt signal, the peripheral process terminates its work using the exit function, this function takes care of killing the satellite process. If the peripheral process intercepts interrupt signals, it calls the user-defined signal-handling function and returns an error code to the user upon exiting the read function. On the other hand, if the satellite executes the stat system function on behalf of the peripheral process, it will not interrupt its execution when it receives a signal (the stat function is guaranteed to exit any pause, since it has a limited resource wait time). The satellite completes the execution of the function and returns the signal number to the peripheral process. The peripheral process sends a signal to itself and receives it at the exit from the system function.

If a signal occurs on the peripheral processor during the execution of a system function, the peripheral process will be in the dark as to whether it will soon return to control from the satellite process or the latter will go into a suspended state indefinitely. The peripheral process sends a special message to the satellite, informing it of the occurrence of a signal. The core on the CPU decrypts the message and sends a signal to the satellite, the reaction of which to receiving the signal is described in the previous paragraphs (abnormal termination of the function execution or its completion). The peripheral process cannot send a message to the satellite directly because the satellite is busy performing a system function and is not reading data from the communication line.

Referring to the read example, it should be noted that the peripheral process has no idea whether its companion is waiting for input from the terminal or is performing other actions. The peripheral process sends a signal message to the satellite: if the satellite is in a suspended state with an interruptable priority, it immediately exits this state and terminates the system function; otherwise, the function is carried forward to successful completion.

Finally, consider the case of the arrival of a signal at a time not associated with the execution of a system function. If a signal originated on another processor, the satellite receives it first and sends a signal message to the peripheral process, whether the signal concerns the peripheral process or not. The peripheral core decrypts the message and sends a signal to the process, which reacts to it in the usual way. If the signal originated on the peripheral processor, the process performs standard actions without resorting to the services of its satellite.

When a peripheral process sends a signal to other peripheral processes, it encodes a kill call message and sends it to the satellite process, which executes the called function locally. If some of the processes for which the signal is intended is located on other peripheral processors, their satellites will receive the signal (and react to it as described above).

13.2 COMMUNICATION TYPE NEWCASTLE

In the previous section, we considered a type of tightly coupled system, which is characterized by sending all calls to the functions of the file management subsystem that arise on the peripheral processor to a remote (central) processor. We now turn to the consideration of systems with a weaker connection, which consist of machines that make calls to files located on other machines. In a network of personal computers and workstations, for example, users often access files located on a large machine. In the next two sections, we will look at system configurations in which all system functions are performed in local subsystems, but at the same time it is possible to access files (through the functions of the file management subsystem) located on other machines.

These systems use one of the following two paths to identify deleted files. On some systems, a special character is added to the composite filename: the name component that precedes this character identifies the machine, the rest of the name is the file on that machine. So, for example, the distinguished name

"sftig! / fs1 / mjb / rje"

identifies the file "/ fs1 / mjb / rje" on the machine "sftig". This file identification scheme follows the uucp convention for transferring files between UNIX-like systems. In another scheme, deleted files are identified by adding a special prefix to the name, for example:

/../sftig/fs1/mjb/rje

where "/../" is a prefix indicating that the file is deleted; the second component of the filename is the name of the remote machine. This scheme uses the familiar UNIX file name syntax, so unlike the first scheme, user programs do not need to adapt to the use of names with unusual construction (see).

Figure 13.9. Formulating requests to the file server (processor)

We will devote the rest of this section to a model of a system using a Newcastle link, in which the kernel is not concerned with recognizing deleted files; this function is completely assigned to the subroutines from the standard C library, which in this case play the role of the system interface. These routines analyze the first component of the file name, which in both of the described identification methods contains a sign of the file's remoteness. This is a departure from routine in which library routines do not parse filenames. Figure 13.9 shows how requests to a file server are formulated. If the file is local, the local system kernel processes the request normally. Consider the opposite case:

open ("/../ sftig / fs1 / mjb / rje / file", O_RDONLY);

The open subroutine in the C library parses the first two components of the distinguished filename and knows to look for the file on the remote machine "sftig". In order to have information about whether the process previously had a connection with a given machine, the subroutine starts a special structure in which it remembers this fact, and in case of a negative answer, establishes a connection with the file server running on the remote machine. When the process formulates its first request for remote processing, the remote server confirms the request, if necessary, records in the fields of the user and group identification codes and creates a satellite process that will act on behalf of the client process.

To fulfill client requests, the satellite must have the same file permissions on the remote machine as the client. In other words, the user "mjb" must have the same access rights to both remote and local files. Unfortunately, it is possible that the "mjb" client identification code may coincide with the identification code of another client on the remote machine. Thus, system administrators on machines running on the network should either ensure that each user is assigned an identification code that is unique to the entire network, or perform code conversion at the time of formulating a network service request. If this is not done, the companion process will have the rights of another client on the remote machine.

A more delicate issue is obtaining superuser rights in relation to working with remote files. On the one hand, the superuser client should not have the same rights over the remote system so as not to mislead the remote system's security controls. On the other hand, some of the programs, if they are not granted superuser rights, simply will not be able to work. An example of such a program is the mkdir program (see Chapter 7), which creates a new directory. The remote system would not allow the client to create a new directory because superuser rights are not in effect on deletion. The problem of creating remote directories serves as a serious reason for revising the system function mkdir in the direction of expanding its capabilities in automatically establishing all connections necessary for the user. However, it is still a common problem that setuid programs (such as the mkdir program) gain superuser privileges over remote files. Perhaps the best solution to this problem would be to set additional characteristics for files that describe access to them by remote superusers; unfortunately, this would require changes to the disk index structure (in terms of adding new fields) and would create too much mess in existing systems.

If the open subroutine succeeds, the local library leaves a corresponding note about this in a user-accessible structure containing the address of the network node, the process ID of the satellite process, the file descriptor, and other similar information. The library routines read and write determine, based on the file descriptor, whether the file is deleted, and if so, send a message to the satellite. The client process interacts with its companion in all cases of accessing system functions that require the services of a remote machine. If a process accesses two files located on the same remote machine, it uses one satellite, but if the files are located on different machines, two satellites are already used: one on each machine. Two satellites are also used when two processes access a file on a remote machine. By invoking the system function via satellite, the process generates a message that includes the function number, the name of the search path and other necessary information, similar to that included in the message structure in the system with peripheral processors.

The mechanism for performing operations on the current directory is more complex. When the process selects a remote directory as the current one, the library routine sends a message to the satellite, which changes the current directory, and the routine remembers that the directory is deleted. In all cases where the search path name begins with a character other than a forward slash (/), the subroutine sends the name to the remote machine, where the satellite process routes it from the current directory. If the current directory is local, the routine simply passes the search path name to the local system kernel. The system chroot function on a remote directory is similar, but it goes unnoticed for the local kernel; strictly speaking, the process can ignore this operation, since only the library records its execution.

When a process calls fork, the appropriate library routine sends messages to each satellite. Satellites processes branch out and send their child ids to the parent client. The client process runs the fork system function, which transfers control to the child it spawns; the local child is in dialogue with the remote satellite child whose addresses are stored by the library routine. This interpretation of the fork function makes it easier for the satellite processes to control open files and current directories. When the process working with remote files exits (by calling the exit function), the subroutine sends messages to all of its remote satellites so that they do the same when they receive the message. Certain aspects of the implementation of the exec and exit system functions are discussed in the exercises.

The advantage of a Newcastle link is that a process's access to remote files becomes transparent (invisible to the user), without the need to make any changes to the system kernel. However, this development has a number of disadvantages. First of all, during its implementation, a decrease in system performance is possible. Due to the use of the extended C library, the size of memory used by each process increases, even if the process does not access remote files; the library duplicates kernel functions and requires more memory space. Increasing the size of processes lengthens the startup period and can create more contention for memory resources, creating conditions for more frequent unloading and paging of tasks. Local requests will be executed more slowly due to the increase in the duration of each call to the kernel, and the processing of remote requests can also be slowed down, the cost of sending them over the network increases. Additional processing of remote requests at the user level increases the number of context switches, unloading and swapping operations. Finally, in order to access remote files, programs must be recompiled using the new libraries; old programs and delivered object modules will not be able to work with remote files without it. All of these disadvantages are absent from the system described in the next section.

13.3 "TRANSPARENT" DISTRIBUTED FILE SYSTEMS

The term "transparent allocation" means that users on one machine can access files on another machine without realizing that they are crossing machine boundaries, just as they are on their machine when they are switching from one file system to another traverse the mount points. The names by which processes refer to files located on remote machines are similar to the names of local files: there are no distinctive characters in them. In the configuration shown in Figure 13.10, the directory "/ usr / src" belonging to machine B is "mounted" in the directory "/ usr / src" belonging to machine A. the same system source code, traditionally found in the "/ usr / src" directory. Users running on machine A can access files located on machine B using the usual syntax of writing file names (for example: "/usr/src/cmd/login.c"), and the kernel itself decides whether the file is remote or local. Users running on machine B have access to their local files (unaware that users of machine A can access the same files), but, in turn, do not have access to files located on machine A. Of course , other options are possible, in particular, those in which all remote systems are mounted at the root of the local system, so that users can access all files on all systems.

Figure 13.10. File systems after remote mount

The similarities between mounting local filesystems and allowing access to remote filesystems have prompted the adaptation of the mount function to remote filesystems. In this case, the kernel has an extended format mount table at its disposal. Executing the mount function, the kernel organizes a network connection with a remote machine and stores information characterizing this connection in the mount table.

An interesting problem has to do with path names that include "..". If a process makes the current directory from a remote filesystem, then using ".." characters in the name will more likely return the process to the local filesystem rather than access files above the current directory. Returning again to Figure 13.10, note that when a process belonging to machine A, having previously selected the current directory "/ usr / src / cmd" located in the remote file system, will execute the command

the current directory will be the root directory of machine A, not machine B. The namei algorithm running in the kernel of the remote system, after receiving the sequence of characters "..", checks whether the calling process is an agent of the client process, and if so, sets, Whether the client is treating the current working directory as the root of the remote file system.

Communication with a remote machine takes one of two forms: a remote procedure call or a remote system function call. In the first form, each kernel procedure dealing with indexes checks to see if the index points to a remote file, and if so, sends a request to the remote machine to perform the specified operation. This scheme naturally fits into the abstract structure of support for file systems of various types, described in the final part of Chapter 5. Thus, an access to a remote file can initiate the transfer of several messages over the network, the number of which is determined by the number of implied operations on the file, with a corresponding increase in the response time to the request, taking into account the waiting time accepted in the network. Each set of remote operations includes at least actions for index locking, reference counting, etc. In order to improve the model, various optimization solutions were proposed related to combining several operations into one query (message) and buffering the most important data (cm. ).

Figure 13.11. Opening a remote file

Consider a process that opens a remote file "/usr/src/cmd/login.c", where "src" is the mount point. By parsing the file name (using the namei-iget scheme), the kernel detects that the file is deleted and sends a request to the host machine to get the locked index. Having received the desired answer, the local kernel creates a copy of the index in memory corresponding to the remote file. Then the kernel checks for the necessary access rights to the file (for reading, for example) by sending another message to the remote machine. The open algorithm continues in full accordance with the plan outlined in Chapter 5, sending messages to the remote machine as needed, until the algorithm is complete and the index is freed. The relationship between the kernel data structures upon completion of the open algorithm is shown in Figure 13.11.

If the client calls the read system function, the client kernel locks the local index, issues a lock on the remote index, a read request, copies the data to local memory, issues a request to free the remote index, and frees the local index. This scheme is consistent with the semantics of the existing uniprocessor kernel, but the frequency of network use (multiple calls to each system function) reduces the performance of the entire system. However, to reduce the flow of messages on the network, multiple operations can be combined into a single request. In the example with the read function, the client can send the server one general "read" request, and the server itself decides to grab and release the index when it is executed. Reducing network traffic can also be achieved by using remote buffers (as we discussed above), but care must be taken to ensure that the system file functions using these buffers are executed properly.

In the second form of communication with a remote machine (a call to a remote system function), the local kernel detects that the system function is related to a remote file and sends the parameters specified in its call to the remote system, which executes the function and returns the results to the client. The client machine receives the results of the function execution and exits the call state. Most of the system functions can be performed using only one network request and receiving a response after a reasonable time, but not all functions fit into this model. So, for example, upon receiving certain signals, the kernel creates a file for the process called "core" (Chapter 7). The creation of this file is not associated with a specific system function, but ends up performing several operations such as creating a file, checking permissions, and performing a series of writes.

In the case of the open system function, the request to execute the function sent to the remote machine includes the part of the file name left after excluding the search path name components that distinguish the remote file, as well as various flags. In the earlier example of opening the file "/usr/src/cmd/login.c", the kernel sends the name "cmd / login.c" to the remote machine. The message also includes credentials such as user and group identification codes, which are required to verify file permissions on a remote machine. If a response is received from the remote machine indicating a successful open function, the local kernel fetches a free index in the local machine's memory and marks it as the remote file index, stores information about the remote machine and the remote index, and routinely allocates a new entry in the file table. Compared to the real index on the remote machine, the index owned by the local machine is formal and does not violate the configuration of the model, which is broadly the same as the configuration used when calling the remote procedure (Figure 13.11). If a function called by a process accesses a remote file by its descriptor, the local kernel knows from the (local) index that the file is remote, formulates a request that includes the called function, and sends it to the remote machine. The request contains a pointer to the remote index by which the satellite process can identify the remote file itself.

Having received the result of executing any system function, the kernel can resort to the services of a special program to process it (upon completion of which the kernel will finish working with the function), because the local processing of results used in a uniprocessor system is not always suitable for a system with several processors. As a result, changes in the semantics of system algorithms are possible, aimed at providing support for the execution of remote system functions. However, at the same time, a minimum flow of messages circulates in the network, ensuring the minimum response time of the system to incoming requests.

13.4 DISTRIBUTED MODEL WITHOUT TRANSFER PROCESSES

The use of transfer processes (satellite processes) in a transparent distributed system makes it easier to keep track of deleted files, but the process table of the remote system is overloaded with satellite processes that are idle most of the time. In other schemes, special server processes are used to process remote requests (see and). The remote system has a set (pool) of server processes that it assigns from time to time to process incoming remote requests. After processing the request, the server process returns to the pool and enters a state ready to process other requests. The server does not save the user context between two calls, because it can process requests from several processes at once. Therefore, each message arriving from a client process must include information about its execution environment, namely: user identification codes, current directory, signals, etc. functions.

When a process opens a remote file, the remote kernel assigns an index for subsequent links to the file. The local machine maintains a custom file descriptor table, a file table, and an index table with a regular set of records, with an index table entry identifying the remote machine and the remote index. In cases where a system function (for example, read) uses a file descriptor, the kernel sends a message pointing to the previously assigned remote index and transfers process-related information: user identification code, maximum file size, etc. the machine has a server process at its disposal, the interaction with the client takes the form described earlier, however, the connection between the client and the server is established only for the duration of the system function.

Using servers instead of satellite processes can make managing data traffic, signals, and remote devices more difficult. Large numbers of requests to a remote machine in the absence of a sufficient number of servers should be queued up. This requires a higher layer protocol than the one used on the main network. In the satellite model, on the other hand, oversaturation is eliminated because all client requests are processed synchronously. A client can have at most one request pending.

Processing of signals that interrupt the execution of a system function is also complicated when using servers, since the remote machine has to search for the appropriate server serving the execution of the function. It is even possible that, due to the busyness of all servers, a request for a system function is pending processing. Conditions for the emergence of competition also arise when the server returns the result of the system function to the calling process and the server's response includes sending a corresponding signaling message through the network. Each message must be marked so that the remote system can recognize it and, if necessary, terminate the server processes. When using satellites, the process that handles the fulfillment of the client's request is automatically identified, and in the event of a signal arriving, it is not difficult to check whether the request has been processed or not.

Finally, if a system function called by the client causes the server to pause indefinitely (for example, when reading data from a remote terminal), the server cannot process other requests to free up the server pool. If several processes access remote devices at once and if the number of servers is limited from above, there is a quite tangible bottleneck. This does not happen with satellites, since a satellite is allocated to each client process. Another problem with using servers for remote devices will be covered in Exercise 13.14.

Despite the advantages that the use of satellite processes provides, the need for free entries in the process table in practice becomes so acute that in most cases, the services of server processes are still used to process remote requests.

Figure 13.12. Conceptual diagram of interaction with remote files at the kernel level

13.5 CONCLUSIONS

In this chapter, we have considered three schemes for working with files located on remote machines, treating remote file systems as an extension of the local one. The architectural differences between these layouts are shown in Figure 13.12. All of them, in turn, differ from the multiprocessor systems described in the previous chapter in that the processors here do not share physical memory. A peripheral processor system consists of a tightly coupled set of processors that share the file resources of the central processor. A connection of the Newcastle type provides hidden ("transparent") access to remote files, but not by means of the operating system kernel, but through the use of a special C library. For this reason, all programs that intend to use this type of link must be recompiled, which, in general, is a serious drawback of this scheme. The remoteness of a file is indicated using a special sequence of characters describing the machine on which the file is located, and this is another factor limiting the portability of programs.

In transparent distributed systems, a modification of the mount system function is used to access remote files. Indexes on the local system are marked as remote files, and the local kernel sends a message to the remote system describing the requested system function, its parameters, and the remote index. Communication in a "transparent" distributed system is supported in two forms: in the form of a call to a remote procedure (a message is sent to the remote machine containing a list of operations associated with the index) and in the form of a call to a remote system function (the message describes the requested function). The final part of the chapter discusses issues related to the processing of remote requests using satellite processes and servers.

13.6 EXERCISES

*one. Describe the implementation of the exit system function in a system with peripheral processors. What is the difference between this case and when the process exits upon receiving an uncaught signal? How should the kernel dump the contents of memory?

2. Processes cannot ignore SIGKILL signals; Explain what happens in the peripheral system when the process receives such a signal.

* 3. Describe the implementation of the exec system function on a system with peripheral processors.

*4. How should the central processor distribute processes among the peripheral processors in order to balance the overall load?

*5. What happens if the peripheral processor does not have enough memory to accommodate all the processes offloaded to it? How should the unloading and swapping of processes in the network be done?

6. Consider a system in which requests to a remote file server are sent if a special prefix is found in the file name. Let the process call execl ("/../ sftig / bin / sh", "sh", 0); The executable is on a remote machine, but must be running on the local system. Explain how the remote module is migrated to the local system.

7. If the administrator needs to add new machines to an existing system with a connection like Newcastle, then what is the best way to inform the C library modules about this?

*eight. During the execution of the exec function, the kernel overwrites the address space of the process, including the library tables used by the Newcastle link to track links to remote files. After executing the function, the process must retain the ability to access these files by their old descriptors. Describe the implementation of this point.

*9. As shown in section 13.2, calling the exit system function on systems with a Newcastle connection results in a message being sent to the companion process, forcing the latter to terminate. This is done at the level of library routines. What happens when a local process receives a signal that tells it to exit in kernel mode?

*10. In a system with a Newcastle link, where remote files are identified by prefixing the name with a special prefix, how can a user, specifying ".." (parent directory) as the filename component, traverse the remote mount point?

11. We know from Chapter 7 that various signals cause the process to dump the contents of memory into the current directory. What should happen if the current directory is from the remote file system? What answer would you give if the system uses a relationship like Newcastle?

*12. What implications for local processes would it have if all satellite or server processes were removed from the system?

*thirteen. Consider how to implement the link algorithm in a transparent distributed system, the parameters of which can be two remote file names, as well as the exec algorithm, associated with performing several internal read operations. Consider two forms of communication: a remote procedure call and a remote system function call.

*14. When accessing the device, the server process can enter the suspended state, from which it will be taken out by the device driver. Naturally, if the number of servers is limited, the system will no longer be able to satisfy the requests of the local machine. Come up with a reliable scheme whereby not all server processes are suspended while waiting for device-related I / O to complete. The system function will not terminate while all servers are busy.

Figure 13.13. Terminal Server Configuration

*15. When a user logs into the system, the terminal line discipline stores the information that the terminal is an operator terminal leading a group of processes. For this reason, when the user presses the "break" key on the terminal keyboard, all processes in the group receive the interrupt signal. Consider a system configuration in which all terminals are physically connected to one machine, but user registration is logically implemented on other machines (Figure 13.13). In each case, the system creates a getty process for the remote terminal. If requests to a remote system are processed by a set of server processes, note that when the open procedure is executed, the server stops waiting for a connection. When the open function completes, the server goes back to the server pool, severing its connection to the terminal. How is an interrupt signal triggered by pressing the "break" key sent to the addresses of processes included in the same group?

*sixteen. Sharing memory is a feature inherent in local machines. From a logical point of view, the allocation of a common area of physical memory (local or remote) can be carried out for processes belonging to different machines. Describe the implementation of this point.

* 17. The process paging and paging algorithms discussed in Chapter 9 assume the use of a local pager. What changes should be made to these algorithms in order to be able to support remote offloading devices?

*eighteen. Suppose that the remote machine (or network) experiences a fatal crash and the local network layer protocol records this fact. Develop a recovery scheme for a local system making requests to a remote server. In addition, develop a recovery scheme for a server system that has lost contact with clients.

*nineteen. When a process accesses a remote file, it is possible that the process will traverse multiple machines in search of the file. Take the name "/ usr / src / uts / 3b2 / os" as an example, where "/ usr" is the directory belonging to machine A, "/ usr / src" is the mount point of the root of machine B, "/ usr / src / uts / 3b2 "is the mount point of the root of machine C. Walking through multiple machines to its final destination is called a multihop. However, if there is a direct network connection between machines A and C, sending data through machine B would be inefficient. Describe the features of the implementation of "multishopping" in a system with a Newcastle connection and in a "transparent" distributed system.

V large holdings tens of thousands of users work in subsidiaries. Each organization has its own internal business processes: approval of documents, issuance of instructions, etc. At the same time, some processes go beyond the boundaries of one company and affect the employees of another. For example, the head of the head office issues an order to the subsidiary, or an employee of the subsidiary sends an agreement for approval with the lawyers of the parent company. This requires a complex architecture using multiple systems.

Moreover, within one company many systems are used to solve different problems: an ERP system for accounting operations, separate installations of ECM systems for organizational and administrative documentation, for design estimates, etc.

The DIRECTUM system will help to ensure the interaction of different systems both within the holding and at the level of one organization.

DIRECTUM provides convenient tools for building managed distributed architecture organizing and solving the following tasks:

organization of end-to-end business processes and data synchronization between several systems of the same company and in the holding;
providing access to data from different installations of ECM systems. For example, search for a document in several specialized systems: with financial documentation, with design and estimate documentation, etc.
administration of many systems and services from a single point of management and creation of a comfortable IT infrastructure;
convenient distribution of development to distributed production systems.

Components of a Managed Distributed Architecture

Interconnection Mechanisms (DCI)

DCI mechanisms are used to organize end-to-end business processes and synchronize data between different systems within one or several organizations (holding).

The solution connects local business processes existing in companies into a single end-to-end process. Employees and their managers work with the already familiar interface of tasks, documents and reference books. At the same time, the actions of employees are transparent at every stage: they can see the text of the correspondence with a related company, see the status of document approval with the parent organization, etc.

Various DIRECTUM installations and other classes of systems (ERP, CRM, etc.) can be connected to DCI. As a rule, installations are divided by areas of business, taking into account the territorial or legal location of organizations and other factors.

Together with DCI, development components are supplied with a detailed description and code examples, thanks to which a developer can create an algorithm for the business processes of his organization.

DCI mechanisms are capable of transmitting large amounts of data and withstand peak loads. In addition, they provide fault tolerance in the event of communication failures and the protection of transmitted data.

Federated search

With federated search, you can find the tasks or documents you need at once in all individual DIRECTUM systems. For example, start a search simultaneously in the working system and in the system with archived documents.

Federated search allows you to:

view through the web client the progress of approval of an outgoing document in a subsidiary;
find agreements concluded with a counterparty in all subsidiaries, for example, for the preparation of negotiations. In this case, you can go to the tasks in which the contracts are enclosed;
check the status of execution of the order sent from the parent organization to the subsidiary, or documents and tasks created on it;
find documents simultaneously in several systems with different specializations, for example, with organizational and administrative documents and with contracts;
find primary accounting documents for audit or reconciliation with a counterparty immediately in the working system and in the system with an archive of documents;
exchange links to search results with colleagues.

The administrator can change standard searches, add new ones, and also customize which systems will be visible to the user.

DIRECTUM Services Administration Center

The DIRECTUM system solves many different tasks: interaction of employees, storage of documents, etc. This is possible due to the reliable operation of its services. And in large companies, they allocate entire installations of the DIRECTUM system with their own set of services for a specific task, for example, for storing archival documents. Installations and services are deployed on multiple servers. This infrastructure needs to be administered.

The DIRECTUM Services Administration Center is a single administrative entry point for configuring, monitoring, and managing DIRECTUM services and systems. The Center is a site for management tools for Session Server, Workflow Service, Event Processing Service, File Storage Service, Input and Transform Services, Federated Search, and Web Help.

Convenient visual configuration of remote systems and services simplifies the work of the administrator. He does not need to go to each server and manually make changes to the configuration files.

Services are stopped and enabled in one click. The status of the services is instantly displayed on the screen.

The list of settings can be replenished and filtered. By default, the site only displays basic settings. At the same time, for all settings, you can see tips with recommendations for filling.

The DIRECTUM system effectively organizes the work of distributed organizations and provides users with a transparent exchange of documents, tasks and directory records.

Each component of a Managed Distributed Architecture can be used separately, but together they can bring greater business value to your organization.

Currently, all developed for commercial purposes IS have a distributed architecture, which implies the use of global and / or local area networks.

Historically, the file-server architecture was the first to become widespread, since its logic is simple and it is easiest to transfer to such an architecture the ISs that are already in use. Then it was transformed into a server-client architecture, which can be interpreted as its logical continuation. Modern systems used in the global network INTERNET mainly relate to the architecture of distributed objects (see Fig. III‑15 )

IS can be imagined consisting of the following components (Fig. III-16)

III.03.2. a File server applications.

It is historically the first distributed architecture (Fig. III-17). It is organized very simply: there is only data on the server, and everything else belongs to the client machine. Since local networks are quite cheap, and due to the fact that with such an architecture the application software is autonomous, such an architecture is often used today. We can say that this is a variant of the client-server architecture, in which only data files are located on the server. Different personal computers interact only by means of a common data store, therefore programs written for one computer are easiest to adapt to such an architecture.

Pros:

Pros of a file server architecture:

Ease of organization;

Does not contradict the necessary requirements for the database to maintain integrity and reliability.

Network congestion;

Unpredictable response to a request.

These disadvantages are explained by the fact that any request to the database leads to the transfer of significant amounts of information over the network. For example, to select one or several rows from tables, the entire table is downloaded to the client machine and the DBMS already selects there. Significant network traffic is especially fraught with the organization of remote access to the database.

III.03.2. b Client-server applications.

In this case, there is a distribution of responsibilities between the server and the client. Depending on how they are separated distinguish fat and thin client.

In the thin client model, all application work and data management is done on the server. The user interface in these systems "migrates" to a personal computer, and the software application itself performs the functions of a server, i.e. runs all application processes and manages data. The thin client model can also be implemented where clients are computers or workstations. The network devices run the Internet browser and the user interface implemented within the system.

Main flaw thin client models - high server and network load. All calculations are performed on the server, and this can lead to significant network traffic between the client and the server. There is enough computing power in modern computers, but it is practically not used in the model / thin client of the bank

In contrast, the thick client model uses the processing power of local machines: the application itself is placed on the client computer. An example of this type of architecture is ATM systems, in which the ATM is the client, and the server is the central computer serving the customer accounts database.

III.03.2. c Two- and three-tier client-server architecture.

All of the architectures discussed above are two-tier. They differentiate between the client level and the server level. Strictly speaking, the IC consists of three logical levels:

· User level;

Application level:

· Data layer.

Therefore, in a two-tier model, where only two tiers are involved, there are scalability and performance problems if the thin client model is chosen, or system management problems if the thick client model is chosen. These problems can be avoided if we apply a model consisting of three levels, where two of them are servers (Fig. III-21).

Data server

In fact, the application server and the data server can be located on the same machine, but they cannot perform the functions of each other. The nice thing about the three-tier model is that it logically separates application execution and data management.

Table III-5 Application of different types of architectures

Architecture	Appendix
Two-tier thin client	1 Legacy systems in which it is not advisable to separate application execution and data management. 2 Compute intensive applications with little data management. 3 Applications with large amounts of data, but little computation.
Two-tier thick client	1 Applications where the user requires intensive data processing, i.e. data visualization. 2 Applications with a relatively constant set of user functions applied to a well-managed system environment.
Three-tier server-client	1 Large applications with cells and thousands of clients 2 Applications in which both data and processing methods change frequently. 3 Applications that integrate data from multiple sources.

This model is suitable for many types of applications, but it limits the IS developers who must decide where to provide services, provide support for scalability, and develop tools to connect new customers.

III.03.2. d Distributed object architecture.

A more general approach is provided by a distributed object architecture, of which objects are the main components. They provide a set of services through their interfaces. Other objects send requests without differentiating between client and server. Objects can be located on different computers in the network and interact through middleware, similar to the system bus, which allows you to connect different devices and maintain communication between hardware devices.

…

ODBC Driver Manager

Driver 1

Driver K

DB 1

DB K

Working with SQL

…

ODBC architecture includes components:

1. Application (eg IS). It performs tasks: requests a connection to the data source, sends SQL queries to the data source, describes the storage area and format for SQL queries, handles errors and notifies the user about them, commits or rolls back transactions, requests a connection to the data source.

2. Device Manager. It loads drivers on demand of applications, offers a single interface to all applications, and the ODBC administrator interface is the same and regardless of which DBMS the application will interact with. The Microsoft-supplied Driver Manager is a dynamic-load library (DLL).

3. The driver depends on the DBMS. An ODBC driver is a dynamic link library (DLL) that implements ODBC functions and interacts with a data source. A driver is a program that processes a request for a function specific to a DBMS (it can modify requests in accordance with the DBMS) and returns the result to the application. Every DBMS that supports ODBC technology must provide application developers with a driver for that DBMS.

4. The data source contains the control information specified by the user, information about the data source and is used to access a specific DBMS. In this case, the means of the OS and the network platform are used.

Dynamic model

This model assumes many aspects, for which at least 5 diagrams are used in UML, see pp. 2.04.2- 2.04.5.

Consider the management aspect. The governance model complements the structural models.

No matter how the structure of the system is described, it consists of a set of structural units (functions or objects). For them to function as a whole, they must be controlled, and there is no control information in the static diagrams. Control models design the flow of control between systems.

There are two main types of control in software systems.

1. Centralized management.

2. Event-based management.

Centralized management can be:

· Hierarchical- on the basis of the "call-return" principle (this is how educational programs most often work)

· Dispatcher Model which is used for parallel systems.

V dispatcher models it is assumed that one of the components of the system is a dispatcher. It manages both the startup and shutdown of systems and the coordination of the rest of the processes in the system. Processes can run in parallel to each other. A process refers to a program, subsystem, or procedure that is currently running. This model can also be applied in sequential systems, where the control program calls individual subsystems depending on some state variables (via the operator case).

Event management assumes the absence of any subroutine responsible for management. Control is carried out by external events: pressing a mouse button, pressing a keyboard, changing sensor readings, changing timer readings, etc. Each external event is encoded and placed in the event queue. If a reaction to an event in the queue is provided, then the procedure (subroutine) is called, which performs the reaction to this event. The events to which the system reacts can occur either in other subsystems or in the external environment of the system.

An example of such management is the organization of applications in Windows.

All of the previously described structural models can be implemented using centralized management or event-based management.

User interface

When developing an interface model, one should take into account not only the tasks of the designed software, but also the features of the brain associated with the perception of information.

III.03.4. a Psychophysical characteristics of a person associated with the perception and processing of information.

The part of the brain, which can be conventionally called a processor of perception, constantly, without the participation of consciousness, processes incoming information, compares it with past experience and puts it in storage.

When a visual image attracts our attention, then the information of interest to us arrives in short-term memory. If our attention was not attracted, then the information in the storage disappears, being replaced by the following portions.

At each moment of time, the focus of attention can be fixed at one point, so if it becomes necessary to simultaneously track several situations, then the focus moves from one tracked object to another. At the same time, attention is scattered, and some details may be overlooked. It is also significant that perception is largely based on motivation.

When you change the frame, the brain is blocked for a while: it masters a new picture, highlighting the most significant details. This means that if you need a quick response from the user, then you shouldn't change the pictures abruptly.

Short-term memory is the bottleneck in a person's information processing system. Its capacity is 7 ± 2 unconnected objects. Unclaimed information is stored in it for no more than 30 seconds. In order not to forget any important information for us, we usually repeat it to ourselves, updating the information in short-term memory. Thus, when designing interfaces, it should be borne in mind that the overwhelming majority find it difficult, for example, to remember and enter numbers containing more than five digits on another screen.

Although the capacity and storage time of long-term memory is unlimited, access to information is not easy. The mechanism for extracting information from long-term memory is associative in nature. To improve the memorization of information, it is tied to the data that the memory already stores and makes it easy to obtain. Since access to long-term memory is difficult, it is advisable to expect the user not to remember the information, but on the fact that the user will recognize it.

III.03.4. b Basic criteria for evaluating interfaces

Numerous surveys and surveys conducted by leading software development firms have shown that users value in an interface:

1) ease of mastering and memorization - specifically estimate the time of mastering and the duration of the preservation of information and memory;

2) the speed of achieving results when using the system, which is determined by the number of commands and settings entered or selected by the mouse;

3) subjective satisfaction with the operation of the system (ease of use, fatigue, etc.).

Moreover, for professional users who constantly work with the same package, the second and third criteria quickly come to the first place, and for non-professional users who work with software periodically and perform relatively simple tasks - the first and third.

From this point of view, today the best characteristics for professional users are interfaces with free navigation, and for non-professional users - direct manipulation interfaces. It has long been noticed that when performing a file copy operation, all other things being equal, most professionals use shells like Far, while non-professionals use Windows "drag and drop".

III.03.4. c Types of user interfaces

The following types of user interfaces are distinguished:

Primitive

Free navigation

Direct manipulation.

The interface is primitive

Primitive is called the interface that organizes the interaction with the user and is used in console mode. The only deviation from the sequential process that data provides is looping through multiple sets of data.

Menu interface.

Unlike the primitive interface, it allows the user to select an operation from a special list displayed by the program. These interfaces assume the implementation of many scenarios of work, the sequence of actions in which is determined by the users. The tree-like organization of the menu suggests that finding an item on more than two-level menus is difficult.

Principles of creating an enterprise-wide information processing system

The history of the development of computer technology (and, accordingly, software) began with separate, autonomous systems. Scientists and engineers were preoccupied with the creation of the first computers and were mainly puzzled over how to make these swarms of vacuum tubes work. However, this state of affairs did not last long - the idea of combining computing power was quite obvious and was in the air, saturated with the hum of metal cabinets of the first ENIAKs and Marks. After all, the idea of combining the efforts of two or more computers to solve complex, unbearable tasks for each of them separately lies on the surface.

Rice. 1. Scheme of distributed computing

However, the practical implementation of the idea of connecting computers into clusters and networks was hampered by the lack of technical solutions and, first of all, by the need to create standards and communication protocols. As you know, the first computers appeared in the late forties of the twentieth century, and the first computer network ARPANet, which connected several computers in the United States, only in 1966, almost twenty years later. Of course, such a combination of computing capabilities of modern distributed architecture resembled very vaguely, but nevertheless it was the first step in the right direction.

The emergence of local area networks over time led to the development of a new area of software development - the creation of distributed applications. We had to do this from scratch, as they say, but, fortunately, large companies, whose business structure required such solutions, immediately showed interest in such applications. It was at the stage of creating corporate distributed applications that the basic requirements were formed and the main architectures of such systems were developed, which are still used today.

Gradually, mainframes and terminals evolved towards a client-server architecture, which was essentially the first version of a distributed architecture, that is, a two-tier distributed system. Indeed, it was in client-server applications that part of the computational operations and business logic was transferred to the client's side, which, in fact, became the highlight, the hallmark of this approach.

It was during this period that it became apparent that the main advantages of distributed applications are:

· Good scalability - if necessary, the computing power of a distributed application can be easily increased without changing its structure;

· The ability to manage load - intermediate levels of a distributed application make it possible to manage the flows of user requests and redirect them to less loaded servers for processing;

· Globality - a distributed structure allows you to follow the spatial distribution of business processes and create client workstations at the most convenient points.

As time went on, small islands of university, government and corporate networks expanded, merged into regional and national systems. And then the main player appeared on the scene - the Internet.

Laudatory eulogies about the World Wide Web have long been a common place for publications on computer topics. Indeed, the Internet has played a pivotal role in the development of distributed computing and has made this rather specific area of software development the focus of an army of professional programmers. Today, it significantly expands the use of distributed applications, allowing remote users to connect and making application functions available everywhere.

This is the history of the issue. Now let's take a look at what distributed applications are.

Distributed computing paradigm

Imagine a fairly large manufacturing facility, trading company, or service provider. All of their divisions already have their own databases and specific software. The central office somehow collects information about the current activities of these departments and provides managers with information on the basis of which they make management decisions.

Let's go further and suppose that the organization we are considering is successfully developing, opening branches, developing new types of products or services. Moreover, at the last meeting, progressive-minded executives decided to organize a network of remote workstations from which customers could receive some information about the fulfillment of their orders.

In the described situation, it remains only to pity the head of the IT department if he did not take care of building a general business flow management system in advance, because without it it will be very difficult to ensure the effective development of the organization. Moreover, one cannot do without an enterprise-wide information processing system, designed taking into account the increasing load and, moreover, corresponding to the main business flows, since all departments must perform not only their tasks, but also, if necessary, process requests from other departments and even ( nightmare for a project manager!) customers.

So, we are ready to formulate the basic requirements for modern enterprise-scale applications dictated by the very organization of the production process.

Spatial separation. The divisions of the organization are dispersed in space and often have poorly unified software.

Structural Compliance. The software should adequately reflect the information structure of the enterprise - it should correspond to the main data streams.

Orientation to external information. Modern enterprises are forced to pay increased attention to working with customers. Therefore, enterprise software must be able to work with a new type of user and their needs. Such users knowingly have limited rights and have access to a strictly defined type of data.

All of the above requirements for enterprise-wide software are met by distributed systems - the computation distribution scheme is shown in Fig. one.

Of course, distributed applications are not free from flaws. Firstly, they are expensive to operate, and secondly, the creation of such applications is a laborious and complex process, and the cost of an error at the design stage is very high. Nonetheless, the development of distributed applications is progressing well - the game is worth the candle, because such software helps to improve the efficiency of the organization.

So, the paradigm of distributed computing implies the presence of several centers (servers) for storing and processing information, implementing various functions and spaced apart. These centers, in addition to the requests of the clients of the system, must also fulfill the requests of each other, since in some cases, the solution of the first task may require the joint efforts of several servers. To manage complex requests and the functioning of the system as a whole, specialized control software is required. And finally, the entire system must be "immersed" in some kind of transport environment that ensures the interaction of its parts.

Distributed computing systems have such common properties as:

· Manageability - implies the ability of the system to effectively control its components. This is achieved through the use of control software;

· Performance - provided due to the possibility of redistributing the load on the servers of the system using the control software;

Scalability - if it is necessary to physically increase productivity, a distributed system can easily integrate new computing resources into its transport environment;

· Extensibility - new components (server software) with new functions can be added to distributed applications.

Access to data in distributed applications is possible from client software and other distributed systems can be organized at various levels - from client software and transport protocols to protection of database servers.

Rice. 2. The main levels of the architecture of a distributed application

The listed properties of distributed systems are sufficient reason to put up with the complexity of their development and high cost of maintenance.

Distributed Application Architecture

Consider the architecture of a distributed application that allows it to perform complex and varied functions. Different sources provide different options for building distributed applications. And they all have a right to exist, because such applications solve the widest range of problems in many subject areas, and the irrepressible development of development tools and technologies pushes for continuous improvement.

Nevertheless, there is the most general architecture of a distributed application, according to which it is divided into several logical layers, data processing layers. Applications, as you know, are designed to process information, and here we can distinguish three main functions of them:

· Data presentation (user level). Here application users can view the necessary data, send a request for execution, enter new data into the system or edit it;

· Data processing (intermediate level, middleware). At this level, the business logic of the application is concentrated, the data flows are controlled and the interaction of the application parts is organized. It is the concentration of all data processing and control functions at one level that is considered the main advantage of distributed applications;

· Data storage (data layer). This is the database server tier. The servers themselves, databases, data access tools, and various auxiliary tools are located here.

This architecture is often referred to as a three-tier or three-tier architecture. And very often on the basis of these "three whales" the structure of the developed application is created. It is always noted that each level can be further subdivided into several sublevels. For example, the user level can be broken down into the actual user interface and rules for validating and processing input data.

Of course, if we take into account the possibility of splitting into sublevels, then any distributed application can be included in the three-tier architecture. But here one cannot ignore another characteristic feature inherent in distributed applications - this is data management. The importance of this feature is obvious because it is very difficult to create a real-world distributed application (with all client stations, middleware, database servers, etc.) that does not manage its requests and responses. Therefore, a distributed application must have another logical layer - the data management layer.

Rice. 3. Distribution of business logic across the levels of a distributed application

Therefore, it is advisable to divide the intermediate level into two independent ones: the data processing level (since it is necessary to take into account the important advantage that it gives - the concentration of business rules for data processing) and the data management level. The latter provides control over the execution of requests, maintains work with data streams and organizes the interaction of parts of the system.

Thus, there are four main layers of a distributed architecture (see Fig. 2):

· Data presentation (user level);

· Business logic rules (data processing layer);

· Data management (data management layer);

· Data storage (data storage layer).

Three of the four levels, excluding the first, are directly involved in data processing, and the data presentation layer allows you to visualize and edit them. With the help of this layer, users receive data from the data processing layer, which, in turn, retrieves information from the repositories and performs all the necessary data transformations. After entering new information or editing existing data, data streams are directed backward: from the user interface through the business rules layer to the repository.

Another layer - data management - stands aside from the data backbone, but it ensures the smooth operation of the entire system, managing requests and responses and the interaction of parts of the application.

Separately, it is necessary to consider the option of viewing data in the "read-only" mode. In this case, the data processing layer is not used in the general data transfer scheme, since there is no need to make any changes. And the flow of information itself is unidirectional - from the storage to the data presentation level.

Physical structure of distributed applications

Now let's turn to the physical layers of distributed applications. The topology of a distributed system implies division into several database servers, data processing servers, and a collection of local and remote clients. All of them can be located anywhere: in the same building or on another continent. In any case, parts of a distributed system must be connected by reliable and secure communication lines. As for the data transfer rate, it largely depends on the importance of the connection between the two parts of the system in terms of data processing and transmission, and to a lesser extent on their remoteness.

Distribution of business logic across distributed application tiers

Now is the time to move on to a detailed description of the levels of a distributed system, but first let's say a few words about the distribution of application functionality across levels. Business logic can be implemented at any of the levels of the three-tier architecture.

Database servers can not only store data in databases, but also contain part of the application's business logic in stored procedures, triggers, etc.

Client applications can also implement data processing rules. If the set of rules is minimal and comes down mainly to procedures for checking the correctness of data entry, we are dealing with a "thin" client. In contrast, a thick client contains a large proportion of the application's functionality.

The level of data processing is actually intended to implement the business logic of the application, and all the basic rules for data processing are concentrated here.

Thus, in the general case, the functionality of the application is "smeared" throughout the application. All the variety of distribution of business logic across application tiers can be represented as a smooth curve showing the proportion of data processing rules concentrated in a specific place. The curves in Fig. 3 are qualitative in nature, but nevertheless allow you to see how changes in the structure of the application can affect the distribution of rules.

And practice confirms this conclusion. After all, there will always be a couple of rules that need to be implemented in the stored procedures of the database server, and it is very often convenient to transfer some initial operations with data to the client side - at least in order to prevent the processing of incorrect requests.

Presentation layer

The data presentation layer is the only one available to the end user. This layer simulates the client workstations of a distributed application and the corresponding software. The capabilities of the client workstation are primarily determined by the capabilities of the operating system. Depending on the type of user interface, client software is divided into two groups: clients that use GUI capabilities (for example, Windows), and Web clients. But in any case, the client application must provide the following functions:

· Receiving data;

· Presentation of data for viewing by the user;

· Data editing;

· Checking the correctness of the entered data;

· Saving the changes made;

· Handling exceptions and displaying information about errors for the user.

It is desirable to concentrate all business rules at the data processing level, but in practice this is not always possible. Then they talk about two types of client software. The thin client contains a minimal set of business rules, while the thick client implements a significant portion of the application logic. In the first case, the distributed application is much easier to debug, modernize and expand, in the second, you can minimize the costs of creating and maintaining the data management layer, since some of the operations can be performed on the client side, and only data transfer falls on the middleware.

Data processing layer

The data processing layer combines the parts that implement the business logic of the application, and is an intermediary between the presentation layer and the storage layer. All data pass through it and undergo changes in it, due to the problem being solved (see Fig. 2). The functions of this level include the following:

· Processing of data streams in accordance with business rules;

· Interacting with the data presentation layer to receive requests and return responses;

· Interaction with the data storage layer to send requests and receive responses.

Most often, the data processing layer is equated with the middleware of a distributed application. This situation is fully true for an "ideal" system and only partially for real applications (see Fig. 3). As for the latter, the middleware for them contains a large proportion of data processing rules, but some of them are implemented in SQL servers in the form of stored procedures or triggers, and some are included in the client software.

Such "blurring" of business logic is justified, since it allows to simplify some of the data processing procedures. Let's take a classic example of an order statement. It can include the names of only those products that are in stock. Therefore, when adding a certain item to the order and determining its quantity, the corresponding number must be subtracted from the remainder of this item in the warehouse. Obviously, the best way to implement this logic is through the DB server — either a stored procedure or a trigger.

Data management layer

The data management layer is needed to ensure that the application remains coherent, resilient and reliable, has the ability to modernize and scale. It ensures the execution of system tasks, without it, parts of the application (database servers, application servers, middleware, clients) will not be able to interact with each other, and connections broken during an increase in load cannot be restored.

In addition, various system services of the application can be implemented at the data management level. After all, there are always functions common to the entire application that are necessary for the operation of all levels of the application, therefore, they cannot be located on any of the other levels.

For example, a time stamp service provides all parts of an application with system timestamps that keep them in sync. Imagine that a distributed application has a server that sends clients tasks with a specific deadline. If the deadline is missed, the task should be registered with the calculation of the delay time. If the client workstations are located in the same building as the server, or on an adjacent street, no problem, the accounting algorithm is simple. But what if customers are located in different time zones - in other countries or even overseas? In this case, the server must be able to calculate the difference taking into account the time zones when sending tasks and receiving responses, and clients will be required to add service information about the local time and time zone to the reports. If a single time service is included in a distributed application, then this problem simply does not exist.

In addition to the one time service, the data management level can contain services for storing general information (information about the application as a whole), generating general reports, etc.

So, the functions of the data management layer include:

· Managing parts of a distributed application;

· Management of connections and communication channels between parts of the application;

· Control of data flows between clients and servers and between servers;

· Load control;

· Implementation of system services of the application.

It should be noted that often the data management layer is created on the basis of ready-made solutions supplied to the software market by various manufacturers. If the developers have chosen the CORBA architecture for their application, then it includes an Object Request Broker (ORB), if the platform is Windows, they have a variety of tools at their service: COM + technology (development of Microsoft Transaction Server technology, MTS), processing technology MSMQ message queues, Microsoft BizTalk technology, etc.

Data storage layer

The storage tier brings together the SQL servers and databases used by the application. It provides a solution to the following tasks:

· Storing data in a database and keeping them in working order;

· Processing of requests of the level of data processing and return of results;

· Implementation of a part of the business logic of a distributed application;

· Management of distributed databases using administrative tools of database servers.

In addition to the obvious functions - storing data and processing queries, a layer can contain a part of the application's business logic in stored procedures, triggers, constraints, etc. And the very structure of the application database (tables and their fields, indexes, foreign keys, etc.) ) there is an implementation of the data structure with which the distributed application works, and the implementation of some rules of business logic. For example, the use of a foreign key in a database table requires the creation of a corresponding restriction on data manipulations, since the records of the main table cannot be deleted if there are corresponding records linked by the foreign key of the table.

Most database servers support a variety of administration procedures, including distributed database management. These include data replication, remote archiving, tools for accessing remote databases, etc. The ability to use these tools should be considered when developing the structure of your own distributed application.

Connecting to SQL Server databases is done primarily with the server client software. In addition, various data access technologies can additionally be used, for example, ADO (ActiveX Data Objects) or ADO.NET. But when designing a system, it is necessary to take into account that functionally intermediate data access technologies do not belong to the data storage level.

Base Level Extensions

The above levels of distributed application architecture are basic. They form the structure of the created application as a whole, but at the same time, of course, they cannot provide the implementation of any application - the subject areas and tasks are too vast and diverse. For such cases, the architecture of a distributed application can be extended with additional layers that are designed to reflect the features of the application being created.

Among others, there are two of the most commonly used base level extensions.

The business interface layer is located between the user interface layer and the data processing layer. It hides from client applications the details of the structure and implementation of business rules of the data processing layer, providing abstraction of the client application code from the implementation features of the application logic.

As a result, developers of client applications use a certain set of necessary functions - an analogue of an application programming interface (API). This makes the client software independent from the implementation of the data processing layer.

Of course, when making serious changes to the system, you cannot do without global alterations, but the level of the business interface allows you not to do this unless absolutely necessary.

The data access layer is located between the data storage layer and the data processing layer. It allows you to make the structure of the application independent of a specific data storage technology. In such cases, the software objects of the data processing layer send requests and receive responses using the means of the chosen data access technology.

When implementing applications on the Windows platform, ADO data access technology is most often used because it provides a universal way to access a wide variety of data sources - from SQL servers to spreadsheets. For applications on the .NET platform, ADO.NET technology is used.