Tuesday, June 4, 2019

Pentium Memory Management Unit Computer Science Essay

Pentium Memory Management Unit Computer Science EssayThe chief(prenominal) aim of the research paper is to collapse Pentium Memory Management Unit. Here, certain key features associated with a fund management building block analogous section, summon, their protection, lay aside associated with MMU in form of adaptation look diversion buffer, how to optimize small master(prenominal)frame computers transaction after implementing those features etc. have been discussed. Some problems and their respective solutions related to Pentium memory management unit argon also covered. Also, the current and afterlife research work d unitary in the field of memory management is covered too. The chief(prenominal) challenge is to get accustomed with the Pentium memory management unit and analyze the crucial factors related.IntroductionA hardware comp 1nt liable in handling divergent penetrati mavins to memory passed by mainframe is know as memory management unit (MMU), which is a lso termed as summonboyd memory management unit (PMMU). The main functions of MMU bum be categorized as follows-1 interlingual rendition of practical(prenominal) holloes to animal(prenominal) extensiones which is also kn feature as virtual memory management (VMM).Memory protection amass ControlBus Arbitration swan switchingThe memory organisation for Pentium micro helpor is 4G bytes in size just as in 80386DX and 80486 micro bringors. Pentium commits a 64-bit entropy bus to hook memory organized in eight banks that each contains 512M bytes of information.Most microprocessors including Pentium also supports virtual memory concept with the help of memory management unit. Virtual memory is utilize to manage the mental imagery of physiologic memory. It gives an practise the illusion of a very large amount of memory, typically much larger than what is effectively acquirable. It supports the execution of processes partially resident in memory. Only the most recently sub programd portions of a processs talk space actually occupy physical memory-the rest of the channelise space is stored on plough until needed. The Intel Pentium microprocessor supports both division and componentation with paging.A nonher important feature supported by Pentium processors is the memory protection. This tool helps in limiting coming to certain divides or scalawags mean(a)d on prerogative take aims and thus protect life-sustaining data if kept in a franchise level with highest priority from different attacks.Intels Pentium processor also supports cache, translation look aside buffers, (TLBs), and a store buffer for temporary on-chip (and external) storage of book of instructions and data.A nonher major issue resolved by MMU is the fragmentation of memory. Some condemnations, the size of largest contiguous dislodge memory is much smaller than the total available memory because of the fragmentation issue. With virtual memory, a contiguous crease of virtua l addresses can be mapped to several non-contiguous blocks of physical memory. 1This research paper basically revolves around different functions associated with a memory management unit of Pentium processors. This includes features like virtual memory management, memory protection, and cache control and so on. Pentiums memory management unit has some problems associated with it and some benefits as well which will be covered in detail in the later part. The above mentioned features help in solving major performance issues and has given a boom to the microprocessor world. write upIn some early microprocessor designs, memory management was performed by a separate integrated circuit such as the VLSI VI475 or the Motorola 68851 used with the Motorola 68020 mainframe computer in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later microprocessors such as the Motorola 68030 and the ZILOG Z280 placed the MMU together with the mainframe computer on the simil ar integrated circuit, as did the Intel 80286 and later x86 microprocessors.The origin memory management unit came into existence with the release of 80286 microprocessor chip in 1982. For the first time, 80286 offered on-chip memory management which makes it sui circuit card for multitasking operations. On many a nonher(prenominal) machines, cache access time limits the quantify cycle rate and in turn it affects more than the average memory access time. thitherfore, to achieve fast access times, fitting the cache on chip was very important and this on-chip memory management paved the way.The major functionalities associated with a memory management are divideation and paging. air division unit was found first and foremost on 8086 processor which had only one purpose of serving as a gateway for 1MB physical address space. To allow easy porting from old applications to the refreshing environment, it was decided by Intel to keep the partitioning unit alive under protect-mode. Protected mode does not have fixed sized memory blocks in memory, but instead, the size and location of each segment is set in an associated data structure called a Segment variant. All memory references are accessed relative to the base address of their corresponding segment so as to allow relocation of program modules jolly easy and also avoid direct system to perform code fix-ups when it loads applications into memory. 2 With paging enabled, the processor adds an extra level of indirection to the memory translation process. kinda of serving as a physical address, an application-generated address is used by the processor to index one of its look-up tables. The corresponding entry in the table contains the actual physical address which is sent to the processor address bus. Through the use of paging, operating systems can create distinct address spaces for each running application thus simplifying memory access and preventing potential conflicts.Virtual-memory allows applicatio ns to allocate more memory than is physically available. This is done by keeping memory varlets partially in pound up and partially on disk. When a program tries to access an on-disk page, anExceptionis generated and the operating system reloads the page to allow the disruptioning application resume its execution. 2The Pentium 4 was Intels final enterprisingness in the realm of single-core CPUs. The Pentium 4 had an on-die cache memory of 8 to 16 KB. The Pentium 4 memory cache is a memory location on the CPU used to store instructions to be processed. The Pentium 4 on-die memory cache is an extremely fast memory location which stored and decoded instructions known as microcode that were about to be executed by the CPU. 3By todays standards, the Pentium 4 cache size is very lacking in capacity. This lack of cache memory kernel the CPU mustiness make more calls to force back for operating instructions. These calls to RAM are performance reducing, as the latency involved in trans ferring data from RAM is much higher than from the on-die cache. Often overlooked, the cache size of any CPU is of vast importance to predicting the performance of acomputerprocessor. While the Pentium 4s level one cache was very limited by todays standards, it was at the time of its release more than adequate for the majority of computer applications. 4Likely Pentium Pros most noticeable appendage was its on-package L2 cache, which ranged from 256 KB at introduction to 1 MB in 1997. Intel placed the L2 die(s) separately in the package which still allowed it to run at the same measure speed as the CPU core. Additionally, unlike most m oppositeboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pros cache had its own back-side bus. Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck. The cache was also non-blocking, meaning that the processor could issue more than one cache request at a ti me (up to 4), reducing cache-miss penalties. These properties combined to produce an L2 cache that was immensely prompt than the motherboard-based caches of older processors. This cache alone gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor configurations, Pentium Pros integrated cache skyrocketed performance in comparison to architectures which had each CPU sharing a central cache. 4However, this far faster L2 cache did come with some complications. The processor and the cache were on separate dies in the same package and connected almost by a full-speed bus. The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny error in either die made it necessary to discard the entire assembly. 5Technical Aspects of Pentiums Memory Management UnitVirtual Memory Management in PentiumThe memory management unit in Pentium is upward compatible with the 80386 and 80486 microprocessors. The one-dimensional address space for Pentium microprocessor is 4G bytes that means from 0 to (232 1).MMU translates the Virtual Address to Physical address in bantam than a single clock cycle for a HIT and also it minimizes the cache fetch time for a MISS. CPU generates legitimate address which are given to segmentation unit which produces business organisationar address which are then given to paging unit and thus paging unit generates physical address in main memory. Hence, paging and segmentation units are sub forms of MMUs. regard 3.1 Logical to Physical Address Translation in PentiumPentium can run in both modes i.e. real or protected. Real mode does not allow multi-tasking as there is no protection for one process to interfere with another whereas in protected mode, each process runs in a separate code segment. Segments have different fringe benefit levels preventing the lower privilege process (such as an application) to run a higher privilege one (e. g. Operating system). Pentium running in Protected mode supports both segmentation and segmentation with paging.Segmentation PentiumThis process helps in dividing programs into logical blocks and then placing them in different memory eye sockets. This makes it possible to regulate access to critical sections of the application and help determine bugs during the development process. It includes several features like to define the exact location and size of each segment in memory and set a specific privilege level to a segment which protects its content from unauthorized access. 6Segment registers are now calledsegment selectorsbecause they do not map directly to a physical address but prime to an entry of the descriptor table.Pentium CPU has six 16 bit segment registers called SELECTORS. The logical address consists of 16 bit of segment size and 32 bit number 1 printing. The beneath figure shows a multi-segment model which uses the full capabilities of the segmentation mechanis m to provide hardware enforced protection of code, data structures, and programs and tasks. This is supported by IA-32 architecture. Here, each program is given its own table of segment descriptors and its own segments. look Multi-Dimensional ModelWhen the processor needs to translate a memory location SEGMENT OFFSET to its corresponding physical address , it takes the following steps 7Step 1 look the start of the descriptor table (GDTR register)The below figure shows CPU selectors provide index (pointer) to Segment Descriptors stored in RAM in the form of memory structures called Descriptor Tables. Then, that address is combined with the kickoff to locate a specific linear address.Figure Selector to Descriptor and then to finally linear address in Pentium MMUStep 2 Find the Segmententry of the table this is the segment descriptor corresponding to the segment.There are two types of Descriptor tables Global Descriptor Table and Local Descriptor table.Global Descript or Table It consists of segment definitions that apply to all programs like the code belonging to operating system segments created by OS before CPU switched to protected mode.Local Descriptor Table These tables are unique to an application.This figure visits the entry of the segment table and then a segment descriptor is elect corresponding to the segment. 7Figure Global and Local Descriptor TablePentium has a 32 bit base address which allows segments to begin at any location in its 4G bytes of memory. The below figure shows the format of a descriptor of a Pentium processor 7Figure Pentium Descriptor changeStep 3 Find the base physical address of the segmentStep 4 Compute = + OFFSET 7Paging UnitPaging is an address translation from linear to physical address. The linear address is divided into fixed length pages and similarly the physical address space is divided into same fixed length frames. Within their respective address spaces pages and frames are number ed sequentially. The pages that have no frames assigned to them are stored on the disk. When the CPU needs to run the code on any non-assigned page, it generates a page fault exception, upon which the operating system reassigns a currently non-used frame to that page and copies the code from that page on the disk to the newly assigned RAM frame. 9Pentium MMU uses the two-level page table to translate a virtual address to a physical address. The page directory contains 1024 32-bit page directory entries (PDEs), each of which points to one of 1024 level-2 page tables. Each page table contains 1024 32-bit page table entries (PTEs), each of which points to a page in physical memory or on disk. The page directory base register (PDBR) points to the beginning of the page directory.Figure Pentium multi-level page table 8For 4KB pages, Pentium uses a two level paging scheme in which division of the 32 bit linear address asFigure Division of 32 bit linear addressThe below figu re shows the complete address translation process in Pentium i.e. from CPUs virtual address to main memorys physical address.Figure Summary of Pentium address translation 8The size of a paging table is dynamic and can become large in a system that contains large memory. In Pentium, due to the 4M byte paging feature, there is just a single page directory and no page tables. Basically, this mechanism helps operating system to create VIRTUAL (faked) address space by swapping code between disk and RAM. This procedure is known as virtual memory support. 9 The paging mechanism in Pentium functions with 4K byte memory pages or with a new extension available to the Pentium with 4M byte memory pages. The 20-bit VPN is partitioned into two 10-bit chunks. VPN1 indexes a PDE in the page directory pointed at by the PDBR. The address in the PDE points to the base of some page table that is indexed by VPN2. The PPN in the PTE indexed by VPN2 is concatenated with the VPO to form the physica l address. 8Figure Pentium knave table Translation 8Segmentation with Paging PentiumPentium supports both pure segmentation and segmentation with paging. To select a segment, program loads a selector for that segment into one of six segment registers. For e.g. CS register is a selector for code segment and DS register is a selector for data segment. Selector can specify whether segment table is Local to the process or Global to the machine. Format of a selector used in Pentium is as followsCBb4JPGfoo4-43.jpgFigure Selector FormatThe steps required to achieve this methodology are as follows-Step 1 Use the Selector to convert the 32 bit virtual offset address to a 32 bit linear address.Step 2 Convert the 32 bit linear address to a physical address using a two-stage page table.Figure mapping of a linear address onto a physical address 9The below figures shows the complete process of segmentation along with paging which is one of the important functionalities of Pentiums memory management unit. 9Figure Segmentation with pagingSome modern processors allow usage of both, segmentation and paging alone or in a combination (Motorola 8030 and later, Intel 80386, 80486, and Pentium) the OS designers have a choice which is cgiven in the below table. 9SegmentationPagingNoNoSmall (embedded) systems,low overhead, high performanceNoYesLinear address spaceBSD UNIX, Windows NTYesNoBetter controlled protection and sharing.ST can be kept on chip predictableaccess times (Intel 8086)YesYesControlled protection/sharingBetter memory management.UNIX Sys. V, OS/2.Figure Usage of segmentation and paging in different processorsIntel 80386, 486 and Pentium support the following MM scheme which is used in IBM OS/2. The diagram is shown belowFigure Intels Memory Management scheme implemented in IBM OS/23.1.4 Optimizing Address Translation in Pentium processorsThe main goal of memory management for address translation is to have all translat ions in less than a single clock cycle for a HIT and minimize cache fetch time for a MISS. On page fault, the page must be fetched from disk and it takes millions of clock cycles which are handled by OS code. To minimize page fault rate, two methods used are-1. Smart replacement algorithms To slim down page fault rate, the most preferred replacement algorithm is least-recently used (LRU). In this, a reference bit is set to 1 in page table entry to each page and is periodically cleared to 0 by OS. A page with reference bit equal to 0 has not been used recently. 102. Fast translation using Translation Look aside Buffer Address translation would appear to require extra memory references i.e. one to access the Page table entry and then the other for actual memory access. But access to page tables has good locality and thus use a fast cache of PTEs indoors the CPU called a Translation Look-aside Buffer (TLB) where the typical rate in Pentium is 16-512 PTEs, 0.5-1 cycle for hit, 10-100 cycles for miss, 0.01%-1% miss rate. 11Page size4KB -64 KBHit Time50-100 CPU clock cyclesMiss PenaltyAccess timeTransfer time106 107 clock cycles0.8 x 106 -0.8 x 107 clock cycles0.2 x 106 -0.2 x 107 clock cyclesMiss rate0.00001% 0.001%Virtual addressspace sizeGB -16 x 1018 byteFigure TLB ratesUsing the below mentioned two methods, TLB misses are handled (hardware or software)The page is in memory, but its physical address is missing. A new TLB entry must be created.The page is not in memory and the control is transferred to the operating system to deal with a page fault where it is handled by causing exception ( kick downstairs) using EPC and Cause register. There are two ways of handling them-Instruction page faultStore the narrate of the processLook up the page table to find the disk address of the referenced pageChoose a physical page to replaceStart a read from disk for the referenced pageExecute another process until the read completesRestart the instruction which ca used the fault 12Data access page faultOccurs in the middle of an instruction.MIPS instructions are restartable prevent the instruction from completing and restart it from the beginning.More complex machines give waying instructions (saving the state of CPU)3. The other method used to reduce the HIT time is to avoid address translation during indexing. The CPU uses virtual addresses that must be mapped to a physical address. A cache that indexes by virtual addresses is called a virtual cache, as opposed to a physical cache. A virtual cache reduces hit time since a translation from a virtual address to a physical address is not necessary on hits. Also, address translation can be done in parallel with cache access, so penalties for misses are reduced as well.Although some difficulties are associated with Virtual cache technique i.e. process switches require cache persecute. In virtual caches, different processes share the same virtual addresses even though they map to different phys ical addresses. When a process is swapped out, the cache must be purged of all entries to make sure that the new process gets the conciliate data. 13Different solutions to overcome this problem are-PID tags Increase the width of the cache address tags to include a process ID (instead of purging the cache.) The current process PID is specified by a register. If the PID does not match, it is not a hit even if the address matches.Anti-aliasing hardware A hardware solution called anti-aliasing guarantees every cache block a unique physical address. Every virtual address maps to the same location in the cache.Page coloring This software technique forces aliases to share some address bits. Therefore, the virtual address and physical address match over these bits.Using the page offset An alternative to get the best of both virtual and physical caches. If we use the page offset to index the cache, then we can overlap the virtual address translation process with the time required to read th e tags. Note that the page offset is unaffected by address translation. However, this restriction forces the cache size to be smaller than the page size.Pipelined cache access Another method to improve cache is to divide cache access into stages. This will lead to the following final givePentium 1 clock cycle per hitPentium II and III 2 clock cycles per hitPentium 4 4 clock cycles per hitIt helps in allowing faster clock, while still producing one cache hit per clock. But the problem is that it has higher branch penalty, higher load delay. 13Trace caches A trace cache is a specialized instruction cache containing instruction traces that is, sequences of instructions that are likely to be executed. It is found on Pentium 4 (NetBurst microarchitecture). It is used instead of formal instruction cache. Cache blocks contain micro-operations, rather than raw memory and contain branches and continue at branch target, thus incorporating branch prediction. Cache hit requires correct branc h prediction. The major advantage is that it makes sure instructions are available to supply the pipeline, by avoiding cache misses that result from branches and the disadvantage is that the cache whitethorn stand up the same instruction several times and it has more complex control. 13System Memory Management ModeThe system memory management mode (SMM) is on the same level as protected mode, real mode and virtual mode, but it is provided to function as a manager. The SMM is not intended to be used as an application or a system level feature. It is intended for high-level system functions such as power management and security, which most Pentiums use during operation, but that are controlled by the operating system.Access to the SMM is accomplished via a new external hardware interrupt applied to the SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins executing system-level software in an area of memory called the system management RAM, or SMMRAM, call ed the SMM state dump place down. The SMI interrupt disables all other interrupts that are normally handled by user applications and the operating system. A return from the SMM interrupt is accomplished with a new instruction called RSM. RSM returns from the memory management mode interrupt and returns to the interrupted program at the point of the interruption.SMM allows the Pentium to treat the memory system as a flat 4G byte system, instead of being able to address the first 1M of memory. SMM helps in executing the software initially stored at a memory location 38000H. SMM also stores the state of the Pentium in what is called a dump record. The dump record is stored at memory locations 3FFA8H through 3FFFFH. The dump record allows a Pentium based system to enter a sleep mode and reactivate at the point of program interruption. This requires that the SMMRAM be powered during the sleep period. The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the RSM instruction. These data allow the RSM instruction to return to the halt state or return to the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering the SMM mode, the RSM instruction reloads the state of the machine from the state dump and returns to the point of interruption. 14Memory protection in PentiumIn protected mode, the Intel 64 and IA-32 architectures provide a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels. The Pentium 4 also supports four protection levels, with level 0 being the most privileged and level 3 the least.Segment and page protection is incorporated in localizing and detecting design problems and bugs. It can also be implemented into end-products to offer added robustness to operating systems, utilities software, and applications software. This protection mechanism is used to verify certain protection checks before actual memory cycle gets started such as Limit checks, type checks, privilege level checks, restriction of addressable domains and so on.The figure shows how these levels of privilege are interpreted as rings of protection. Here, the center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. At each instant, a running program is at a certain level, indicated by a 2-bit field in its PSW (Program Status Word). Each segment also belongs to a certain level.Figure 3.3.1 Protection on Pentium IIMemory protection implemented by associating protection bit with each frame valid-invalid bit attached to each entry in the page tableValid indicates that the associated page is in the process logical address space, and is thus a legal page.Invalid indicates that the page is not in the process lo gical address space.As long as a program restricts itself to using segments at its own level, everything works fine. Attempts to access data at a higher level are permitted. Attempts to access data at a lower level are illegal and cause traps.3.4 Cache in Pentium ProcessorsCache control is one of the most common techniques for improving performance in computer systems (both hardware and software) is to utilize caching for much accessed information. This lowers the average cost of accessing the information, providing greater performance for the overall system. This applies in processor design, and in the Intel Pentium 4 Processor architecture, caching is a critical component of the systems performance.The Pentium 4 Processor Architecture includes multiple types and levels of cachingLevel 3 Cache This type of caching is only available on some versions of the Pentium 4 Processor (notably the Pentium 4 Xeon processors). This provides a large on-processor tertiary memory storage area t hat the processor uses for keeping information nearby. Thus, the contents of the Level 3 cache are faster to access.Level 2 Cache this type of cache is available in all versions of the Pentium 4 Processor. It is normally smaller than the Level 3 cache and is used for caching both data and code that is being used by the processor.Level 1 Cache this type of cache is used only for caching data. It is smaller than the Level 2 Cache and generally is used for the most frequently accessed information for the processor.Trace Cache this type of cache is used only for caching decoded instructions. Specifically, the processor has already broken down the normal processor instructions into micro operations and it is these micro ops that are cached by the P4 in the Trace Cache.Translation Look aside Buffer (TLB) this type of cache is used for storing virtual-to-physical memory translation information. It is an associative cache and consists of an instruction TLB and data TLB.Store Buffer thi s type of cache is used for taking arbitrary write operations and caching them so they may be written back to memory without blocking the current processor operations. This decreases contention between the processor and other parts of the system that are accessing main memory. There are 24 entries in the Pentium 4.Write Combining Buffer this is similar to the Store Buffer, except that it is specifically optimized for burst write operations to a memory region. Thus, multiple write operations can be combined into a single write back operation. There are 6 entries in the Pentium 4.The disadvantage of caching is handling the situation when the original copy is modified, thus making the cached information incorrect (or stale). A significant amount of the work done within the processor is ensuring the consistency of the cache, both for physical memory as well as for the TLBs. In the Pentium 4, physical memory caching remains coherent because the processor uses the MESI protocol. MESI def ines the state of each unique cached piece of memory, called a cache line. In the Pentium 4, a cache line is 64 bytes. Thus, with the MESI protocol, each cache line is in one of four statesModified the cache line is have by this processor and there are modifications to that cache line stored within the processor cache. No other part of the system may access the main memory for that cache line as this will obtain stale information.Exclusive the cache line is owned by this processor. No other part of the system may access the main memory for that cache line.Shared the cache line is owned by this processor. Other parts of the system may acquire shared access to the cache line and may read that particular cache line. no(prenominal) of the shared owners may modify the cache line.Invalid the cache line is in an indeterminate state for this processor. Other parts of the system may own this cache line, or it is possible that no other part of the system owns the cache line. This proces sor may not access the memory and it is not cached. 15 occurrent Problems and Solution associated with themWhen you run multiple programs (especially MS-DOS-based programs) on a Windows-based computer that has insufficient system memory (RAM) and contains an Intel Pentium Pro or Pentium II processor, information in memory may become unavailable or damaged, leading to unpredictable results. For example, copy and compare operations may not work consistently.This behavior is an indirect result of certain performance optimizations in the Intel Pentium Pro and Pentium II processors. These optimizations affect how the Windows 95 Virtual Machine Manager (VMM) performs certain memory operations, such as find out which sections of memory are not in use and can be safely freed. As a result, the Virtual Machine Manager may free the wrong pages in memory, leading to the symptoms described earlier. This problem no longer occurs in Windows 98. To resolve this problem, install the current version of Windows. 16There is a little problem with sharing in

No comments:

Post a Comment