Showing posts with label arm. Show all posts
Showing posts with label arm. Show all posts

Tuesday, March 18, 2014

ARM 3-Stage Pipeline

What is Pipeline?
Pipeline is used to improve the perforamnce of the overall system by allowing multiple instructions(which are in different stage) parallely. So first understand the 3 stages of pipeline :

Fetch--------> Decode -----------> Execute

As per their names, In Fetch, we fetch the instructions, in Decode we decode the instruction and finally execure in Execute stage.

So keeping these 3 stages in mind, Whenever instuction 1 is in Executing stage(as it started first), instruction 2 will be in Decoding stage and instruction 3 will be in Fetching stage.

So if you see this, then there will not be any improvement for a single instruction, as every instruction will be taking the same 3 cycles to execute. But the overall performance will be improved.

Problem with pipeline: 
As like all other good things in the world, pipeline also have it share of cons. Pipeline creates problem when there is a branch instruction, because whenever branch instruction is in executing stage, It has to go to some other instruction instead of the instruction which processor had taken in fetch/decode stage while branch instruction was in fetch/decoding stage. I know it may be confusing, so lets understand this with an example:

[As I dont know assembly language, instruction syntax can be different what you expect]
1. ADD R1,R2,R3 [Add R2, R3 and keep it in R1]
2. ADD R2,R3,R4
3. SUB R3, R4,R5
4. JMP X
5. ADD R2,R3,R4
6. SUB R3, R4,R5
7. X:
8. ADD R1,R2,R3


So you can see when instruction 4(JMP) will be in execution stage, instruction 5 will be in decode stage and instruction 6 in fetch stage. But as isntruction 4 is a jump statement, thee is no use of executing instruction 5 and 6.
           In that case pipeline will be flush, and next 2 cycle will be waste. As for instruction 7 to execute, you will need 2 cycle, and as there is no eligible instruction to execute in pipeline, there will not be any output during this cycle.

So it is not necessary that pipeline will always improve performance.

Branch Prediction:
To overcome this problem caused by pipeline, Architecture came up with branch prediction. It tries to predict which could be executed next, . There are different ways to achieve this, I am not covering those here. you can refer wikipedia or ARM information center.

Sunday, March 16, 2014

Cache Management in ARM

What is Cache?
Cache is a place where a processor can put data and instruction. 

Why we need Cache?
As we know accessing memory to fetch instructions and data is way slower than processor clock, so fetching any instruction from the memory can take multiple clock cycle. To improve this scenario, ARM provides the concept of Cache, it is a small, fast block of memory which sits between processor core and memory. It holds copies of items in main memory. That way we have kept one intermediate small memory which is faster than main memory.

How it works?
If we have cache in our system, there must be a cahce controller. Cache controller is a hardware which manages cache without the knowledge of programmer. So whenever processor wants to read or write anything, first it will check it in the cache, this is called cache lookup. If it is there it will return the result to processor. If required instruction/data is not there, it is  known as cache miss. And request will be forwarded to main memory.
                            Apart from checking whether data/instruction existence in the cache, there is one more thing in core, which is a write buffer. Write buffer is used to save further time for processor. So suppose processor wants to execute a store instruction, what it can do is, it can put the relevant information such as which location to write, which buffer needs to copy, and what is the size of buffer which is getting copied into this write buffer, after that processor is free to do take next instruction, Write buffer will then put this data into the memory.

Level of Cache:
There are generally two caches(L1 and L2), some architecture have only one cahce(L1) too.
L1 caches are typically connected directly to the core logic that fetches instructions and handles load and store instructions. These are Harvard caches, that is, there are separate caches for instructions and for data that effectively appear as part of the core. Generally these have very small size 16KB or 32KB. This size depends on the capability of providing single cycle access at a core speed of 1GHz or
more.
             Other is L2 cache, these are larger in size, but slower and unified in nature. Unified because it uses single cahce for instructions and data. L2 cache could be part of core or be a external block.

Cache Coherency:
As every core has its own L1 cache, It needs a mechanism to maintain the coherency between all the caches, because if one cache is updated and other is not, This will create data inconsistency.
There are 2 ways to solve this scenario:

1. Hardware manage coherency : It is the most efficient solution. So the data which is shared among caches will always be updated. Everything in that sharing domain will always contain the same exact value for that share data.
2. Software manager coherency : Here the software, usually device drivers, must clean or flush dirty data from caches, and invalidate old data to enable sharing with other processors or masters in the system. This takes processor cycles, bus bandwidth, and power.

Wednesday, February 5, 2014

Why Interrupts can't sleep ? And Nested Interrupt

So Lets start with a simple and very important question, why Interrupt can't sleep.

First go back to our understanding to User Context, there we can sleep why? because when you sleep in it, you can call schedule() which can ask other processes to run.

But when you are in Interrupt context, the only thing residing on that Processor is the Interrupt handler(of the respective interrupt). So if you sleep now, there is nothing else to schedule. So its like processor itself is in sleep which is not good at all.

Nested Interrupt

Now there are few things associated with Interrupt, like nested interrupts. So Nested interrupt is when you are running and interrupt handler and more interrupts come on the same line. So in interrupt handler you are getting another interrupt.

When this situation arises ?
Suppose for some reason you need to call enable_irq() in interrupt handler, so it means you are telling your hardware that you are ready to receive the new interrupt.

How Kernel Handles it ?
For this situation, Kernel have third mode called System Mode( apart from user and kernel mode). This is also a privileged mode. So if you want to enable nested interrupt you should change mode to system, and then you should enable interrupt. Before enabling interrupt, you keep LR and SPSR of IRQ mode in Stack of System mode.
This way we save LR and SPSR from being corrupted. And once the new interrupt handled it can return to the previous interrupt and finally it can go back to original location.
you can refer this link for more clarity.