Sunday, March 16, 2014

Cache Management in ARM

What is Cache?
Cache is a place where a processor can put data and instruction. 

Why we need Cache?
As we know accessing memory to fetch instructions and data is way slower than processor clock, so fetching any instruction from the memory can take multiple clock cycle. To improve this scenario, ARM provides the concept of Cache, it is a small, fast block of memory which sits between processor core and memory. It holds copies of items in main memory. That way we have kept one intermediate small memory which is faster than main memory.

How it works?
If we have cache in our system, there must be a cahce controller. Cache controller is a hardware which manages cache without the knowledge of programmer. So whenever processor wants to read or write anything, first it will check it in the cache, this is called cache lookup. If it is there it will return the result to processor. If required instruction/data is not there, it is  known as cache miss. And request will be forwarded to main memory.
                            Apart from checking whether data/instruction existence in the cache, there is one more thing in core, which is a write buffer. Write buffer is used to save further time for processor. So suppose processor wants to execute a store instruction, what it can do is, it can put the relevant information such as which location to write, which buffer needs to copy, and what is the size of buffer which is getting copied into this write buffer, after that processor is free to do take next instruction, Write buffer will then put this data into the memory.

Level of Cache:
There are generally two caches(L1 and L2), some architecture have only one cahce(L1) too.
L1 caches are typically connected directly to the core logic that fetches instructions and handles load and store instructions. These are Harvard caches, that is, there are separate caches for instructions and for data that effectively appear as part of the core. Generally these have very small size 16KB or 32KB. This size depends on the capability of providing single cycle access at a core speed of 1GHz or
more.
             Other is L2 cache, these are larger in size, but slower and unified in nature. Unified because it uses single cahce for instructions and data. L2 cache could be part of core or be a external block.

Cache Coherency:
As every core has its own L1 cache, It needs a mechanism to maintain the coherency between all the caches, because if one cache is updated and other is not, This will create data inconsistency.
There are 2 ways to solve this scenario:

1. Hardware manage coherency : It is the most efficient solution. So the data which is shared among caches will always be updated. Everything in that sharing domain will always contain the same exact value for that share data.
2. Software manager coherency : Here the software, usually device drivers, must clean or flush dirty data from caches, and invalidate old data to enable sharing with other processors or masters in the system. This takes processor cycles, bus bandwidth, and power.

No comments: