Contacts

Bottle neck card. What processor will reveal the video card? Extension Bottleneck will give a tangible increase in productivity and efficiency

IN lately Various IT bloggers have greatly gained popularity. And with all respect for them - obeying the profit in the Highips, they created a lot of strange information that various users use, absolutely not understanding about something.

But there is no really simple information anywhere - either written a lot of superfluous (for ordinary mortals), complex tongue. Or skip some grains that you have to collect on foreign forums, resources, etc.

Therefore, I decided to make a series of blogs about how the games are generally working, which is what it affects what is "video card disclosure", etc. And describe it as simple as possible and affordable.

P.1. "How does it work? Speaking, please!"

In order to further even simple things seemed to be a "Chinese diploma" - let's deal with what is "game" and how the device shows us.

The game is at its essence is a 3D application. And, like any application, it is initially going from different "pieces" as "Lego".

By conducting an analogy, we get:


1) The processor is the main brain, builds the peaks, calculates physics (in most cases). If you draw an analogy - this is the one who reads the assembly instructions

2) The video card - it creates textures, hangs effects, "makes beautiful" - by analogy, this is the one who collects by the instructions.

3) Hard disk - Stores directly game files - by analogy This is a box with a designer.

4) RAM and video memory - keeps in itself frequent circulation data, video memory - stores textures. These are pieces of the designer that you pull out and throw next to you, whatever you reach the box.

As we see, each component of our device, whether it is a PC, a console or even a smartphone, performs certain actions that our game is started. This, of course, the most primitive idea, but this is already enough to understand how it works.

P. Does the processor reveal the video card?

A lot of conversations were on this topic. Many talked about whether there was a concept at all. According to my considerations - yes, there is, in a sense.

There is such a concept - "bottle of neck". If in simple - someone does something slowly, because of this, the whole process will stop. If you return to our analogies - then this is either the instruction read slowly, or the baby video card does not have time to make "bricks", or even, just the details put too far and have to walk behind them.

Now understand how "friends" processor and a video card and who opens up?

Situation 1. Bottle Gorelshko - Video card:


As a result, we will get 15 frames on the screen per second. At the same time, the video card will work on a complete, the processor will work half. This is the most ideal option, in this case they say that "the processor reveals the video card completely." The processor during the game also needs to handle the various systems of the system itself, to trace what Skype \\ Wyber would work \\ Timspik and much more. Therefore, a small "stock" of the processor should remain.

What gives us this? In the case of a PC - we can reduce the graphics settings that the video card could make more "machines" -cadres. So we will get more FPS in the game.

There is also a reverse version:


Here we also get 15 frames. At the same time, the processor works for us to complete, and the video card is "idle" (resting). In this case, they say that the processor does not reveal the video card.

What gives us this? With this scenario - we will not be able to "jump above the head." See more FPS than it gives us the processor we can not. But, since we have a video card resting - we can force it to collect not from ordinary plastic bricks, but from metal with drawings and rhinestones. If on game shortcuts - we can put more resolution, better effects, more progressive smoothing - up to the moment the card does not work 100%, issuing all the same 15 frames.

p2.1 So how to understand which processor and the video card to take it?

The Internet is full of iron tests. When the video card is tested - it creates ideal conditions, whatever she laid out in any case. When testing processors - do the same.

what do we need, what would the game go to 60 frames without any problems? Let's look at the example of WITCHER3, because It was tested on everything you can.

We need to determine which processor will allow us to see 60 frames in the game. At the same time, according to a good, we need to take with a stock, so that the processor would have what the background tasks process.


As we see, even Phenom2 is enough for this game. And with him we will see 60 frames in this game, if the video card will not become a bottle of neck. Actually, let's see which card it is suitable for:


What do we see? What to play on the most maximum settings In 60FPS, we need a card 980 and higher.

And now the most interesting thing is in this game, it is on such settings that a 980th card will reveal an old hairdryer. Thus, asking the question "will the processor reveal such a video card" just look at what FPS shows your processor in games you are interested in. And then see which FPS can give a video card.

In the second part, I plan to tell about hard disk, SSD, RAM and video memory (their influence on the game.)

P.S. Thank you. What they read. This is my first blog entry, therefore I will be glad to constructive criticism. If found some inaccuracies, mistakes, and so on. - Write in the comments, I will be used.

Good day!

A good day, nothing foreshadowed. But the problem came - the speed of work of some application has become unacceptable small, and another week / month / day ago everything was fine. It is necessary to solve it quickly, spending as little time as possible. Problem server based on Windows Server 2003 or later.

I hope the following writer will be quite brief and understandable and also useful for both beginner administrators, and for more serious comrades, for you can always find something new for yourself. Do not immediately rush to explore the behavior of the application. First, it is worth seeing, whether the performance of the server is enough for this moment? Are there any "bottle of necks" that limit its performance?

In this, PERFMON will help us - a fairly powerful tool that goes along with Windows. Let's start with the fact that we will give the definition of a "bottle neck" is a resource that has reached its limit on use. Usually they arise due to incorrect resource planning, upright problems or improper behavior of the application.

If you open Perfmon, we will see dozens and hundreds of all kinds of sensors, and the number of these does not contribute to the rapid investigation of this problem. So, for a start, we will highlight the 5 main possible "bottle of tanks" to shorten the list of sensors under study.

This will be a processor, RAM, data storage system (HDD / SSD), network and processes. Next, we consider each of these items, which sensors we need and threshold values for them.

CPU

Overloaded processor processor clearly does not contribute fast work Applications. To study its resources, we highlight only 4 sensors:

Processor \\% Processor Time

Measures the ratio of the time of operation of the processor to the time of downtime as a percentage. Most understandable sensor, loading the processor. MS recommends changing the processor to faster if the value is above 85%. But it depends on many factors, you need to know your needs and features, since this value may vary.

Processor \\% User Time

Shows how much time the processor is spent in User Space. If the value is large, then this means that applications take a lot of processor time, it is worth looking at them, since it is brewing the need for optimization.

Processor \\% Interrupt Time

Measures the time that the processor spends on the waiting for an interrupt response. This sensor can show the presence of "iron" problems. MS recommends to start worrying if this value exceeds 15%. This means that some kind of device begins to respond very slowly to requests and should be checked.

SYSTEM \\ PROSESSOR QUEUE LENGTH

Shows the number of streams in the queue awaiting their time to execution. MS recommends to think about changing the processor with a greater number of nuclei, if the value exceeds the number of nuclei, multiplied by two.

RAM

The shortage of RAM can strongly affect the overall performance of the system, forcing the system to actively use a slow HDD to slip. But even if there seems to be a lot of RAM on the server, memory can "proceed". Memory leakage is an uncontrolled process for reducing the number of free memory associated with errors in programs. It is also worth mentioning that for Windows Volume Virtual memory is the amount of RAM and paging file.

Memory \\% Committed Bytes in Use

Shows the use of virtual memory. If the value has exceeded 80%, then it is worth thinking about adding RAM.

Memory \\ Available Mbytes

Shows the use of RAM, namely the number of megabytes available. If the value is less than 5%, then again it should be thought of adding RAM.

Memory \\ Free System Page Table Entries

The number of free elements of the page table. It is limited, in addition, in our days, popularity is gaining pages in 2 or more MB, instead of classical 4kb, which does not contribute to them large quantity. The value of less than 5000 may indicate a memory leakage.

Memory \\ Pool Non-Paged Bytes

The size of this pool. This is a piece of kernel memory, which contains important data and cannot be unloaded into SWAP. If the value exceeded 175 MB, then, most likely, this is a memory leakage. This is usually accompanied by the emergence of events 2019 in the system log.

Memory \\ Pool Paged Bytes

Similar to the previous one, but this area You can unload on the disk (swap) if they are not used. For this meter, the value above 250 MB is considered critical, usually accompanied by the appearance of 2020 events in the system log. Also talks about leakage of memory.

Memory \\ Pages Per Second

The number of calls (recording / reading) to the Page File per second due to the lack of necessary data in RAM. And again the value of more than 1000 hints at the memory leakage.

HDD

A fairly important element that can make a substantial contribution to the system performance.

LogicalDisk \\% Free Space

Percentage of free space. Interested only sections containing system files - OS, file / paging files, etc. MS recommends taking care of increasing disk spaceif there is less than 15% left free space, since it can end with the critical loads (Temp files, windows updates or the same paging file). But, as they say, "it dependes" and it is necessary to look at the actually affordable size of the space, because The same paging file can be toughly fixed, on the TEMP "You are superimposed quotas that prohibit them to grow, and the updates are distributed portion and rarely, or they are not at all.

PhysicalDisk \\% Idle Time

Shows how much time the disk is inactive. It is recommended to replace the disk to a more productive if this meter is below 20% of the border.

PhysicalDisk \\ AVG. DISK SEC / READ

The average time required by the hard disk to read the data from itself. Above 25ms - it is already bad, for SQL server and Exchange it is recommended 10ms and less. Recommendation is identical to the previous one.

PhysicalDisk \\ AVG. DISK SEC / WRITE

Identically PhysicalDisk \\ AVG. DISK SEC / READ, only for recording. The critical threshold is also equal to 25ms.

PhysicalDisk \\ AVG. Disk Queue Length.

Shows the average number of I / O operations waiting when the hard disk will become available for them. It is recommended to start worrying if this number is twice as much as the number of spindles in the system (in the absence of RAID arrays the number of spindles is equal to the number hard disks). The Council is the former - more productive HDD.

Memory \\ Cache Bytes

The amount of memory used for the cache, part of which is file. The volume of more than 300MB can talk about the problem with the performance of HDD or the availability of an application that actively uses the cache.

Net

In the modern world without her anywhere - great amount Data is broadcast on the network.

Network Interface \\ Bytes Total / Sec

Number of data transmitted (Send / Receive) through network adapter. A value that exceeds 70% of the interface bandwidth indicates a possible problem. You need to either replace the card to more productive, or add one more to unload the first.

Network Interface \\ Output Queue Length

Shows the number of packages facing the shipment. If the value has exceeded 2, then it is worth thinking about replacing the card to a more productive one.

Processes

The server performance may disastrous if there is a non-optimized application or an application begins to behave "incorrectly".

Process \\ Handle Count

The number of descriptors processed by the process. It can be both files and registry keys. The amount of these exceeding 10,000 can be an indicator incorrect work Applications.

Process \\ Thread Count

The number of threads inside the process. It is worth carefully studying the behavior of the application if the difference between the minimum and maximum number These will exceed 500.

Process \\ Private Bytes

Shows the number of memory allocated to the process that cannot be provided to other processes. If the oscillation of this indicator exceeds 250 between the minimum and maximum, this indicates a possible leakage of memory.

Most of the above counters have no clear indicator that a bottle of neck appeared in the system. All given values \u200b\u200bwere built according to the average results and may vary for various systems in a wide range. To take advantage of these meters competently, we must know at least the system indicators when it is normal. This is called Baseline Performance - Perfmon Log, shot with a working freshly installed (the last optional, never late to remove this log or take into account the changes of the Baseline performance in the long run) of a system that does not have any problems. This is enough important moment, Often many lowered, although in the future he may seriously reduce the possible simple system and explicitly speed up the analysis of the data from the above counters.

Taken from https: //ru.intel.com/business/community/? Automodule \u003d Blog & Blogid \u003d 57161 & SH ...

0 0

FX vs Core i7 | We are looking for bottlenecks with EyeFinity configuration

We have seen how every three or four years the performance of processors doubled. And yet the most demanding game engines, which we tested, are also old as core processors 2 duo. Naturally, bottlenecks from the CPU should have led to the past, right? As it turned out, the GPU speed grows even faster than the performance of the central processor. Thus, a dispute about buying a faster CPU or build-up graphic power continues.

But the moment will always come when it is meaningless. For us, he has come when the games began to work smoothly at the very big Monitor with native resolution of 2560x1600. And if a faster component can provide an average of 200, and not 120 frames per second difference still will not be noticeable.

In response to the lack of higher permits for rapid graphics adapters, AMD introduced Eyefinity technology, and NVIDIA - Surround. Both technologies allow you to play more than one monitor, and for the GPU High-End class work on the resolution of 5760x1080 has become an objective reality. In fact, three displays with a resolution of 1920x1080 will cost cheaper and impress you more than one screen on 2560x1600. Hence the reason to be used to spend more powerful graphic solutions.

But is it really necessary powerful processorTo play without "brakes" at a resolution of 5760x1080? The question turned out to be interesting.

Recently AMD introduced a new architecture, and we bought a box FX-8350 . In the article "Review and AMD FX-8350 Test: Does Piledriver Correct Bulldozer Disadvantages?" In the new processor we liked a lot.

From an economic point of view, in this comparison, Intel will have to prove that it is not only faster than the AMD chip in the games, but also justifies the high difference in price.


Both motherboards refer to the ASUS Sabertooth family, however, for the model with the LGA 1155 connector, the company asks a higher price, which still complicates the position of Intel in terms of budget. We specifically chose these platforms so that the performance comparison was as fair as possible, while the cost per charge was not accepted.

FX vs Core i7 | Configuration and tests

While we waited for the appearance in Testlabe FX-8350 , Boxed tests. Considering that the AMD processor without any problems reaches 4.4 GHz, we started testing the Intel chip at the same frequency. Later it turned out that we underestimated our samples, since both CPUs reached 4.5 GHz at the selected voltage level.

We did not want to postpone the publication due to re-testing at higher frequencies, so it was decided to leave test results at a frequency of 4.4 GHz.

Test configuration
CPU Intel Intel Core i7-3770K ( Ivy Bridge.): 3.5 GHz, 8 MB of total cache L3, LGA 1155 acceleration up to 4.4 GHz per 1.25 V
Motherboard Intel ASUS SABERTOOTH Z77, BIOS 1504 (03/03/2012)
Cooler CPU Intel THERMALRIGHT MUX-120 W / ZALMAN ZM-STG1 PASTE
CPU AMD. AMD FX-8350 (Vishera): 4.0 GHz, 8 MB of general cache L3, Socket AM3 + acceleration to 4.4 GHz for 1.35 V
Motherboard AMD. ASUS SABERTOOTH 990FX, BIOS 1604 (10/24/2012)
Cooler CPU AMD. Sunbeamtech Core-Contact Freezer W / Zalman ZM-STG1 Paste
Net Built-in Gigabit LAN controller
Memory G.Skill F3-17600CL9Q-16GBXLD (16 GB) DDR3-2200 CAS 9-11-9-36 1.65 V
Video card 2 x MSI R7970-2PMD3GD5 / OC: GPU, 1010 MHz GDDR5-5500
Storage device Mushkin Chronos Deluxe DX 240 GB, SATA 6 Gb / s SSD
Food SEASONIC X760 SS-760KM: ATX12V v2.3, EPS12V, 80 Plus Gold
By and Driver
Operating system Microsoft Windows 8 Professional RTM X64
Graphic driver AMD Catalyst 12.10

Due to the high efficiency and fast installation, for several years we have been using the thermalright MUX-120 and Sunbeamtech Core Contact Freezer coolers. However, the mounting brackets that come complete with these models are not interchangeable.


G.Skill F3-17600CL9Q-16GBXLD memory modules have a DDR3-2200 CAS 9 characteristic and use Intel XMP profiles for semi-automatic configuration. SABERTOOTH 990FX uses XMP values \u200b\u200bvia ASUS DOCP.

The SEASONIC X760 power supply unit provides high efficiency necessary to estimate platform differences.

Starcraft II does not support AMD Eyefinity technology, so we decided to use older games: Aliens VS. Predator and Metro 2033.

Test configuration (3D games)
Aliens VS. Predator using AVP Tool v.1.03, SSAO / Tessellation / Shadow incl.
Test configuration 1: Quality textures High, without AA, 4x AF
Test Configuration 2: Quality Textures Very High, 4x AA, 16x AF
Battlefield 3. Campaign mode, "Going Hunting" "90s FRAPS
Test Setup 1: Medium quality (without AA, 4X AF)
Test Setup 2: Ultra quality (4x AA, 16X AF)
F1 2012. Steam version, built-in benchmark
Test Setup 1: High quality, without aa
Test Setup 2: Quality Ultra, 8x AA
Elder Scrolls V: Skyrim Update 1.7, Celedon Aethirborn Level 6, 25-seconds FRAPS
Test setting 1: DX11, High detail level without aa, 8x AF, FXAA incl.
Test setting 2: DX11, the level of detail Ultra, 8x AA, 16x AF, FXAA ON.
Metro 2033. Full version, Built-in Benchmark, Scene "Frontline"
Test setting 1: DX11, HIGH, AAA, 4X AF, without physx, without DOF
Test Setup 2: DX11, Very High, 4x AA, 16x AF, without PHYSX, DOF incl.

FX vs Core i7 | Test results

Battlefield 3, F1 2012 and Skyrim

But first let's take a look at power consumption and efficiency.

Energy consumption is not overclocked FX-8350 Compared to the Intel chip not so terrible, although in fact it is higher. However, in the chart we do not see the picture as a whole. We have not seen the chip to work at a frequency of 4 GHz at a constant load on basic settings. Instead, when processing eight streams in PRIME95, it reduced the multiplier and voltage to stay within the stated thermal package. Trottling artificially restrains CPU power consumption. Installing a fixed multiplier and voltage significantly increases this indicator at the Vishera processor during acceleration.

At the same time, not all games can use the ability to processor FX-8350 Processing eight data streams at the same time, therefore, they will never be able to bring the chip before the trolling mechanism is triggered.

As noted, during the games for not overclocked FX-8350 Trottling is not activated, since most of the games cannot fully download the processor. In fact, games are beneficial to use technology. Turbo Core.which increases the frequency of the processor to 4.2 GHz. The worst of the entire chip AMD showed itself on the average performance diagram, where Intel markedly coming forward.

For an efficiency diagram, we use the average power consumed and the average performance of all four configurations as an average. In this diagram, the performance on the Watt processor AMD FX-8350 It is about two thirds of the Intel result.

FX vs Core i7 | Can AMD FX catch up with Radeon HD 7970?

When we talk about good and affordable hardware, we love to use such phrases as "80% of performance for 60% of the cost." These indicators are always very honest, since we have already in the habit of measuring performance consumed power and efficiency. However, they take into account the cost of only one component, and components, as a rule, cannot work alone.

Adding components used in today's review, the price of the system on intel database It has increased to $ 1900, and AMD platforms up to $ 1724, this is without taking into account buildings, periphery and operating system. If we consider "ready-made" solutions, then it is worth adding about $ 80 for the hull, as a result we get $ 1984 at Intel and $ 1804 in AMD. Savings on the finished configuration with AMD processor is $ 180, in the percentage ratio of the total cost of the system it is a bit. In other words, the other components of the High-End class computer are understood the value of a more profitable processor price.

As a result, we still have two absolutely preconceived methods of price and productivity. We openly admitted, so we hope that we will not condemn us for the results presented.

For AMD it is more profitable if we include only the cost of motherboard and CPU and increase benefits. It turns out such a diagram:

As a third alternative, you can consider motherboard and the processor as an upgrade, assuming that the housing, power supply, memory and drives remained from the last system. Most likely a couple of video cards Radeon HD 7970. It was not used in the old configuration, therefore, it is wisply to take into account the processors, motherboards, and graphic adapters. Thus, we add two video cards with GPU Tahiti for $ 800.

AMD. FX-8350 It looks more profitable than Intel (especially in the games on the settings selected) only in one case: when the rest of the system "free". Since the remaining components can not be free, FX-8350 Also not be able to become a profitable acquisition for games.

AMD Intel and Video Cards

The results of our tests have long shown that ATI graphics chips are more processor-dependent than NVIDIA chips. As a result, when testing a GPU High-End class, we equip our test stands Intel processors, bypassing the disadvantages of the platform, which may interfere with the isolation of graphic performance and adversely affect the results.

We hoped that AMD Piledriver. Change the situation, but even several impressive improvements it was not enough to ensure that the CPU developer team compared with the effectiveness of the graphic solutions team in the AMD itself. As, wait for the release of AMD chips based on the Steamroller architecture, which promises to be 15% more productive piledriver.

When assembling the game PC, the most expensive part is the video card, and I want it to fully worked out my money. Then the question arises: what should I choose a processor for this video card so that it does not limit it in games? In this dilemma, our specially prepared material will help you.

Introduction

So it turns out that the main thing in the computer is the processor and he commands everything else. It is he who provides orders to your video card on the drawing of certain objects, and also calculates the physics of objects (even with some operations considers the processor). If the video card does not work on complete power, and the processor can no longer be faster, then the "bottle neck" effect occurs when the system performance is limited to its weakest component.

In reality, there are always operations when the video card is not strained at all, and the percent is plowing to full, but we are talking about games here, so we will argue in this paradigm.

How is the load between the processors and the video card?

It should be noted that with a change in settings in the game, the ratio of the workload of the processor and the video card is changing.

With increasing permission and settings, the load on the video card increases faster than the processor. This means that if the processor is not a bottleneck for smaller permissions, then there will be no bigger ones.

With a decrease in resolution and tinctures, the graphics are all on the contrary: the load on the processor during the calculation of one frame is almost no change, and the video card becomes much easier. In such a situation, the processor is greater likely to become a bottle of neck.

What are the signs of Bottleneck?

For the test you need a program. You need to look at the chart "Loading GP".

You also need to know the load on the processor. This can be done in monitoring the system in the Task Manager, there is a processor load schedule.

So, what are the signs that the processor does not open the video card?

  • The load of the GP is not close to 100%, and the load of the CPU is all the time near this mark
  • SHIPPING SHARE GP Highly jumps (maybe a bad optimized game)
  • When changing the graphics settings, FPS does not change

It is on these features that you can find out if Bottleneck has a place in your case?

How to deal with the choice of processor?

To do this, I advise you to watch processor tests in the game you need. There are sites that are specially engaged in this ().

Test example in the game Tom Clancy's The Division:

Typically, when testing processors in different games, charts and permission are indicated. Conditions are selected so that the bottle neck is a processor. In this case, you can find out for how one or another processor is capable of this permission. Thus, it is possible to compare processors among themselves.

Games are different (captain obvious) and they may have different process requirements. So, in one game, everything will be fine and the processor will cope with scenes without any problems, and in another video card will be cooled while the processor will be with great difficulty perform its tasks.

The strongest thing is affected by this:

  • the complexity of physics in the game
  • complicated space geometry (many large buildings with many details)
  • artificial Intelligence

Our advice

  • We advise you when choosing to navigate exactly on the tests with the graphics settings you need and the FPS you need (what your card will pull).
  • It is advisable to look at the most demanding games, if you want to be sure that future new items will work well.
  • You can also take a processor with a margin. Now games work well even on chips of 4 years ago (), which means that a good processor will now be very long to delight you in games.
  • If the FPS in the game is normal, and the load on the video card is low, load it. Increase graphics settings so that the video card worked on the full one.
  • When using DirectX 12, the load on the processor should decrease slightly, which will reduce the requirements for it.

Technical progress is not moving uniform in all areas, it is obvious. In this article, we will look at what nodes at what time have improved its characteristics slower than others, becoming a weak link. So, today's topic is the evolution of weak links - as they arose, influenced, and how they were eliminated.

CPU

From the earliest personal computers, the main part of the calculations lay on the CPU. This was due to the fact that the chips were not very cheap, because most of the peripherals used the processor time under their needs. Yes, and the periphery was then quite a bit. Soon with the expansion of the scope of the PC, this paradigm has been revised. The time of the heyday of various expansion cards.



In the time of the "doubles" and "trecracies" (this is not Pentium II and III, how young people can solve, and the processors I286 and i386) tasks were put in front of the systems not very complex, mainly office applications and calculations. Expansion cards already partially unloaded the processor, for example, the MPEG decoder, which was involved in the decryption of files compressed in MPEG, did it without the participation of the CPU. A little later, standards began to be developed that less loaded the processor when exchanging data. An example was the PCI tire (appeared, starting with i486), a work on which a lesser extent loaded the processor. Also, such examples include PIO and (U) DMA.


Processors increased power with a good pace, a multiplier appeared, since the speed of the system bus was limited, and the cache - to disguise requests to the operational memory operating at a lower frequency. The processor still remained a weak link, and the speed of work was almost entirely completely depended.



Meanwhile, Intel after the issue of not bad pentium processor Releases New Generation - Pentium MMX. She wanted to change the state of affairs and transfer the calculations to the processor. This helped the MMX instructions - Multimedia Extensions, which was intended to accelerate work with sound processing and video. With it, MP3 format music began to play normally, and it was possible to achieve acceptable MPEG4 playback by CPU tools.

First plugs in the tire

Pentium MMX processor systems have already rested in the PSP (memory bandwidth). A bus in 66 MHz for a new processor was a bottleneck, despite the transition to a new SDRAM memory type, which improved performance in terms of megahertz. For this reason, there was a very popular acceleration over the tire when 83 MHz (or 75 MHz) was exposed and a very noticeable increase was obtained. Often, even the smaller final frequency of the processor was compensated by a larger frequency of the tire. For the first time, the greater speed was able to achieve at a lower frequency. Another bottleneck was the amount of RAM. For SIMM memory, it was a maximum of 64 MB, but more often stood 32 MB or 16. It greatly complicated the use of programs, since each new windows versionAs you know, loves to "eat a lot of delicious frame" (C). Recent rumors about the collusion of memory manufacturers with Microsoft Corporation.



Intel, meanwhile, began to develop expensive and therefore not a very popular Socket8 platform, and AMD continued to develop Socket7. Unfortunately, the latter used slow in their products FPU. (Floating Point Unit. - operation module with fractional numbers), Created by Nexgen only purchased by the then attacked, which attracted a backlog from a competitor in multimedia tasks - first of all, games. The translated on the 100 MHz tire gave the processors the necessary PSP, and the second-level full-speed cache in 256 Kb on the AMD K6-3 processor has improved so much that now the speed of the system was characterized only by the processor frequency, not a tire. Although, partly, it was associated with a slow FPU. Office applications depending on the power of ALU, thanks to the fast memory subsystem worked faster decisions Competitor.

Chipsets

Intel abandoned expensive Pentium Pro, in which the L2 Cache Crystal was integrated into the processor, and released Pentium II. This CPU had a kernel, very similar to the core of Pentium MMX. The main differences were the Cash L2, which was located on the processor cartridge and worked at half the core frequency, and the new tire - Agtl. With the help of new chipsets (in particular, i440BX), it was possible to increase the frequency of the tire to 100 MHz and, accordingly, the PSP. According to efficiency (the ratio of random reading speed to theoretical), these chipsets have become one of the best, and to this day Intel could not beat this indicator. The I440BX series chipsets have one weak link - the southern bridge, whose functionality has not satisfy the requirements of that time. The old south bridge from the i430 series used in Pentium I-based systems is used. This circumstance, as well as the connection between the PCI bus chipsets, prompted the manufacturers to release hybrids containing the North Bridge I440BX and the South Bridge VIA (686A / B).



Meanwhile, Intel demonstrates the playback of the DVD film without auxiliary cards. But Pentium II did not receive great recognition due to high cost. Obvious became the need to release cheap analogues. The first attempt - Intel Celeron without a cache L2 - became unsuccessful: at the speed of Covington, they were very much lost to competitors and their prices were not justified. Then Intel makes the second attempt to be successful - the MenDocino core, which has fallen in love with overclockers, having half a smaller cache (128 Kb against 256 KB in Pentium II), but working on a twice greater frequency (at the processor frequency, it is not half slower, like Pentium Ii). Due to this, the speed in most tasks was not lower, and the smaller price attracted buyers.

The first 3D and again tire

Immediately after the release of Pentium MMX, the popularization of 3D technologies began. At first, it was professional applications for the development of models and graphics, but a real era was opened by 3D games, or rather, 3D voodoo accelerators created by 3DFX. These accelerators have become the first mass maps to create 3D scenes that unloaded the processor during rendering. From this time it was the countdown of the evolution of three-dimensional games. Quite quickly, the scene's calculation by the central processor forces began to play with the videoscor tools made both at speed and quality.



With the advent of a new powerful subsystem - graphic, which has become the volume of calculated data to compete with central processor, got out a new bottle neck - PCI bus. In particular, Voodoo 3 cards and senior received an increase in speed already simply when the PCI bus is accelerated to 37.5 or 41.5 MHz. Obviously, there was a need to provide video cards a fairly fast tire. Such a bus (or rather, the port) became AGP - Accelerated Graphics Port. As the name implies, it is a specialized graphics tire, and according to the specification, it could have only one slot. The first version of AGP maintained AGP 1x and 2x velocities, which corresponded to a single and twin velocity of PCI 32/66, that is, 266 and 533 MB / s. The slow version was added for compatibility, namely, it had a sufficiently large time problem. And the problems were with all chipsets, with the exception of Intel issued. According to rumors, these problems have been associated with the presence of a license only from this company and its preventing the development of the Socket7 competing platform.



AGP has improved the state of affairs, and the graphic port has ceased to be a bottleneck. The video card switched to it very quickly, but the Socket7 platform has suffered almost to the very end from compatibility issues. Only the last chipsets and drivers were able to improve this situation, but then the nuances arose.

And the screws are there!

Coppermine time has come, frequencies have grown, the speed has grown up, the new video cards have improved performance and enamicated conveyors and memory. The computer has already become a multimedia center - they lost music and watched movies. Weak in terms of characteristics integrated sound cards are inferior to the position of Sblive!, By the folk choice. But something prevented the full idyll. What was it?



These factor have become hard drives, the growth of the volume of which slowed down and stopped at about 40 GB. For collectors of films (then MPEG4) it caused difficulties. Soon the problem was solved, and pretty quickly - the discs were preydly in the amount of up to 80 GB and higher and ceased to worry most of the users.


AMD produces very good platform - Socket A and K7 architecture processor called Athlon marketers (ARGON Title), as well as Budget Duron. Atlons have strong parties and a powerful FPU, which made it excellent processors for serious settlements and games, leaving his competitor - Pentium 4 - Role office machineswhere, however, powerful systems have never been required. Early Duron had a very small amount of cache and tire frequency, which complicated its competition with Intel Celeron (Tualatin). But because of the best scalability (due to the speedy tire), they better responded to frequency growth, and therefore older models have already calmly overturn Intel solutions.

Between two bridges


During this period, two bottlenecks appeared at once. The first is a tire between bridges. Traditionally, PCI was used for these purposes. It is worth remembering that the PCI in the design computers used in the desktop has a theoretical bandwidth of 133 MB / s. In fact, the speed depends on the chipset and application and varies from 90 to 120 MB / s. In addition to this, the bandwidth is separated between all devices to it connected. If we have two IDE channels with the theoretical bandwidth of 100 MB / s (ATA-100) connected to the tire with the theoretical bandwidth of 133 MB / s, then the problem is obvious. LPC, PS / 2, SMBUS, AC97 have low bandwidth requirements. But Ethernet, ATA 100/133, PCI, USB 1.1 / 2.0 are already operated by speeds comparable to the intermospope interface. For a long time There were no problems. USB was not used, Ethernet required infrequently and mainly at a speed of 100 Mbps (12.5 MB / c), and hard drives could not even close to the maximum of the interface speed. But time went, and the situation changed. It was decided to make a special intercube (between the bridges) of the tire.


Via, Sis and Intel have released their tire variants. They differed, first of all, throughput. They started with PCI 32/66 - 233 MB / s, but the main thing was done - the PCI bus was highlighted only under its own devices, and it did not need to transmit data to other tires through it. This has improved the speed of operation with the periphery (relative to the bridge architecture).


Based the bandwidth of the graphic port. The possibility of working with Fast Writes modes allowed to write data into video memory directly, bypassing system Memory, and Side Band Addressing, used to transmit an additional part of the tire of 8 bits, usually intended for transmitting technical data. The increase from the use of FW was achieved only at high load on the processor, in the remaining cases it gave member. So, the difference between the 8x mode from 4X was within the framework of the error.

Processor-dependence

Another bottleneck, relevant to this day, has become a processor-dependence. This phenomenon arose as a result of the rapid development of video cards and meant the insufficient tip power "Processor - chipset - memory" in relation to the video card. After all, the number of frames in the game is determined not only by the video card, but also by this bundle, since it is the latter that provides a map of instructions and data to be processed. If the bundle does not sleep, then the video subsystem will be strengthened into the ceiling, determined mainly by it. Such a ceiling will depend on the power of the map and the settings used, but there are also maps with such a ceiling with any settings in a specific game or with the same settings, but in most modern games to it almost with any processors. For example, the GeForce 3 card strongly rested into the performance of Puntium III and Pentium 4 processors on the Willamete kernel. A slightly older GeForce 4 Ti model has already lacked Athlon 2100 + -2400 +, and the increase in the improvement of the characteristics of the ligament was very noticeable.



How did the characteristics improve? At first, AMD, using the fruits of the developed efficient architecture, simply increased the frequency of processors and improved the technological process, and the manufacturers of chipsets - memory bandwidth. Intel continued to follow the policy of increasing the clock frequencies, the benefit of the NetBurst architecture precisely to this. Intel processors On Willamete nuclei, Northwood with a 400QPB bus (Quad Pumped Bus) lost to competing solutions with a bus 266 MHz. After the implementation of 533QPB processors were equal in performance. But then Intel instead of 667-MHz bus embedded in server solutions, solved processors for desktop computers Translate immediately to the 800 MHz bus to make a power supply for competition with the Barton kernel and the new Athlon XP 3200+ top. Intel processors strongly rested in the tire frequency, and even 533qpb was not enough to provide a data stream in sufficient amount. That is why the released 3.0-GHz CPU on the 800 MHz tire overtook in all, except, perhaps a small number, an application processor 3.06 MHz applications on a 533 MHz bus.


Support for new frequency modes for memory was also introduced, and a two-channel mode appeared. It was done to align the bandwidth of the processor and memory tire. Two-channel DDR mode just corresponded to QDR at the same frequency.


For AMD, the two-channel mode was a formality and gave a hardly noticeable increase. The new prescott kernel did not bring an unequivocal increase in speed and places lost the old Northwood. Its main purpose was transferred to a new technical process and the possibility of further frequency growth. The heat dissipation has greatly increased due to leakage currents, which put the cross on the release of the model operating at a frequency of 4.0 GHz.

Through the ceiling to new memory

Generation Radeon 9700/9800 and GeForce 5 for processors of the time problems with processor-dependence has not caused. But the generation of GeForce 6 puts most of the systems on their knees, since the performance gain was very noticeable, and therefore the processor-dependence is higher. Top processors on Barton kernels (Athlon XP 2500+ - 3200+) and Northwood / Prescott (3.0-3.4 MHz 800FSB) rested in a new limit - the frequency limit of memory and the bus. Especially from this, the AMD has suffered - the 400 MHz tire was insufficient to implement the power of a good FPU. Pentium 4 has a better situation and with minimal timings they showed good results. But JEDEC did not want to certify higher frequency and having smaller memory modules. Therefore, there were two options: or the complex four-channel mode, or the transition to DDR2. The latter occurred, and the LGA775 platform (Socket T) was presented. The tire remained the same, but the frequency of memory was not limited to 400 MHz, but just started with it.



AMD solved the problem better from the point of view of scalability. Generation of the K8, which was carrying the technical name Hammer, in addition to increasing the number of instructions for the clock (partly due to the shorter conveyor), had two innovations with the back of the future. They became a built-in memory controller (or rather, the north bridge with most of its functional) and fast universal tire Hypertransport, which served to communicate a processor with chipset or processors among themselves in a multiprocessor system. The built-in memory controller made it possible to avoid a weak link - the bundle "chipset - processor". FSB How such an exist ceased, there was only a tire of memory and a bus HT.


This allowed Athlon'am 64 easy to overtake existing solutions Intel on the NetBurst architecture and show the devotance of the ideology of the long conveyor. Tejas had a lot of problems and did not go out. These processors easily implemented the potential gEFORCE cards 6, however, like older Pentium 4.


But here the innovation appeared, which made the processors with a weak link for a long time. His name is Multi-GPU. It was decided to revive the ideas of 3DFX SLI and to move in NVIDIA SLI. ATI responded symmetrically and released Crossfire. These were technologies for processing scenes by two cards. The double theoretical power of the video subsystem and calculations associated with the partition of the frame into parts at the expense of the processor led to the system distortion. Senior Athlon 64 loaded such a bundle in large permissions. GEFORCE 7 and ATI Radeon. The X1000 further increased this imbalance.


Along the way a new tire was developed PCI EXPRESS.. This bidirectional sequential tire is intended for the periphery and has a very high speed. She came to replace the AGP and PCI, although he did not force her completely. In view of the versatility, speed and low cost of implementation, she quickly displaced AGP, although it did not bring any increase in speed at that time. There was no difference between them. But in terms of unification it was a very good step. Now payments with support for PCI-E 2.0, which is twice as large (500 MB / s per side against the previous 250 MB / s per line) throughput. The increase in the current video cards did not give it. The difference between different PCI-E modes is possible only in the event of a shortage of video memory, which means an imbalance for the card itself. Such a card is the GeForce 8800GTS 320 MB - it is very sensitive to changing the PCI-E mode. But to take an unbalanced card, only to assess the increase from PCI-E 2.0, - the solution is not the most reasonable. Another thing, cards with support for TurboCache and HyperMemory - technologies for using RAM as a video memory. Here, the increase in the memory capacity will be approximately twofold, which will have a positive effect on performance.


It is enough for a memory video card enough in any range of devices with different VRAM volumes. Where there will be a sharp drop in frames per second, there is a shortage of videoram. But it happens that the difference becomes very noticeable only with non-chamber modes - resolution of 2560x1600 and AA / AF per maximum. Then the difference of 4 and 8 frames per second, though it will be twofold, but it is obvious that both modes are impossible in real conditions, because it is not worth taken into account.

New answer video chippers

The release of the new architecture of Core 2 (the technical name CONROE) has improved the situation with processor-dependence and the solution on GeForce 7 SLI loaded without any problems. But the quad SLI and GeForce 8 came to revenge, restoring the skew. So continues and still. The situation only aggravated with the exit of the 3rd SLI and preparing for the release of Quad SLI on the GeForce 8800 and CrossFire X 3-Way and 4-Way. Wolfdale output slightly raised clock frequenciesBut the acceleration of this processor is not enough to load such video systems normally. 64-bit games are a rarity, and the increase in this mode is observed in isolated cases. Games receiving the growth of four cores can be counted on the fingers of one hand of a disabled person. As usual, everyone pulls the Microsoft, downloading its new OS and memory, and the processor for cool live. It is implicated that the 3-way SLI and CrossFire X technology will work exclusively under Vista. Given the appetites one, perhaps gamers will be forced to take quad-core processors. This is connected with a more uniform than in Windoes XP, loading nuclei. If it should exone a fair share of the processor time, so let even if the kernels are eaten, which are not all equal to the game. However, I doubt that new operating system It will be satisfied with the data on the cords.



Intel platform is out of herself. Four nuclei is already strongly suffering from lack of memory bandwidth and delays associated with tire switching. The tire is shared, and the time for intercepting the tire is required to be controlled. With two cores it is tolerant, but on four influence of temporary losses becomes noticeable. Also system tire It has long been not sleeping for the PSP. The influence of this factor was weakened by improving efficiency. asynchronous regimethat Intel has implemented well. Workstations to an even greater degree suffer from this fault of an unsuccessful chipset, the controller of the memory of which provides only up to 33% of the theoretical PSP. An example of this is the Intel Skulltrail platform loss in most game applications (3DMark06 CPU TEST is not a gaming application :)) even when using the same video cards. Therefore, Intel announced the new generation of Nehalem, which will implement the infrastructure, very similar to AMD developments - a built-in memory controller and a bus for the QPI periphery (CSI technical name). This will improve the scalability of the platform and will give positive results in dual processor and multi-core configurations.


AMD now has several bottled necks. The first is associated with the caching mechanism - due to it there is a certain PSP limit, depending on the frequency of the processor, such that above this value is not possible to jump, even using higher-frequency modes. For example, with an average processor, the difference in working with the memory between DDR2 667 and 800 MHz may be about 1-3%, for a real task - generally insignificant. Therefore, it is best to select the optimal frequency and lower the timings - the controller spoke very well on them. Therefore, to implement DDR3 makes no sense - there are no big timings to damage, the increase may not be increasing. Also, the AMD problem is now slow (despite SSE128) Processing SIMD instructions. It is for this reason that Core 2 will greatly overtake K8 / K10. ALU, always a strong intel place, has become even stronger, and in some cases it can be at times faster than its fellow in Phenom. That is the main trouble aMD processors - Weak "mathematics".


Generally speaking, weak links are very dependent on the specific task. Only "Epochal" were considered. So, in some tasks, the speed may resume into the RAM volume or the speed of the disk subsystem. Then more memory is added (the volume is determined using performance counters) and RAID arrays are set. The speed of the games can be enhanced by disconnecting the built-in sound card and buying a normal discrete - Creative Audigy 2 or X-Fi, which are less than the processor, processing the effects by your chip. This is more related to AC'97 sound cards and in less than HD-Audio (Intel Azalia), since the problem of high processor loading was fixed in the latter.


Remember, the system should always be taken under specific tasks. Often, if the video card can choose a balanced (and then the choice of price categories It will depend on highly distinguished prices at different points), then, let's say, this possibility is not always available with the disk subsystem. RAID 5 is needed very few, but for the server is an indispensable thing. The same refers to a two-processor or multi-core configuration, useless in office applicationsBut this is "Must Have" for a designer working in 3DS MAX.



Did you like the article? Share it