Let's Share some thoughts.............: 2014

Friday, September 26, 2014

Device Tree in Linux

In the series of short notes on kernel concepts. Here is my new post on Device Tree. Device Tree has become a important part of linux kernel now. So lets start with it in my usual way of writing :

What is Device Tree?
- Device Tree is a tree like structure, which is used to store hardware information.
- It is a description of machine's hardware that is readable by the OS, so that kernel code could separated from machine specific information.
- So using Device tree, you can get any information from it at any point of time by parsing it.

What is the benefit of Device Tree?
- It ensures a single board file for a chipset, rest of the device specific information can be moved to device tree.
- Platform devices are created at run-time by the kernel by parsing the device tree nodes.
- we can add the devices which are not present right now, but in future they will, so no need to write the code again at later stage.
- So overall it steps toward the single kernel binary for all variation of chipset, which can be clubbed with specific device tree blob to create different binaries for different variation. This will reduce the time to support new platform for the chipset.

How to enable DT support ?
- First thing is create an entry in board file, between DT_MACHINE_START & DT_MACHINE_END. you need to look into a board file to understand it better. Remember you must have dt_compat field with the appropriate dt_mach structure, which will be containing the chipset details.
- Add your device tree source(dts) file. It have the compatible , model and id field, so it should be filled with appropriate variable. [ check /arch/arm/boot/dts/ folder for .dts file]
- To enable building this tree, there are 2 ways :
a) Enable through menuconfig -> boot options -> flattened device tree
or
b) define 'select USE_OF ' in Kconfig entry in arch/arm/Kconfig

Structure of Device Tree:
Device tree describes a machine's hardware with a combination of

1) Nodes: It represents the device or bus. Contains properties and child nodes. It is defined as node_name@unit_address.

2) Properties: provides information about a device. It consists of a name and a value. Properties can further be categorized as Standard or Non-Standard.

Standard Properties:
a) compatible : it is used to distinguish between different nodes. Recommended format for compatible property is "manufacturer, model". OS can use it for machine/device driver selection.

b) phandle : specifies an unique numerical identifier for a node with the device tree. Used by other nodes that need to refer to the node associated with the property.

c) status : Indicate the operational status of a device. Values for this could be "okay", "disabled", "fail".

d) #address-cells,#size-cells : May be used in any device node that has children in the device tree hierarchy. Describes how the child nodes should be addressed. #address-cells tells no of cells to encode the address field in child's reg property. #size-cells tells no of cells to encode the size field in child's reg property.

e)reg : Arbitrary number of pairs of address and size. It describes the address and the size od the device's resources. It tells no of cells to specify address and size are specified by the #address-cells and #size-cells properties in the parent of the device node.

These are the main things. If you want more details, I will suggest you to refer Documentation/devicetree in linux kernel.

Thursday, September 25, 2014

Short Notes on System Call

This post covers pretty much all main points which you need to know about System Call in Linux. These points are my notes which I covered through many blogs/sites for my study. So lets start it :

What is System Call ?

- As Linux doesn't allow user process to directly modify anything in kernel space. It has provided a way to achieve it, which is known as System Call. So basically system call provide a layer between the kernel and user space process. Using this, kernel performs user space process's task inside the kernel space. This way kernel have control on, what a user process can do.

Why you need a System Call ?

- As system call is the only legal entry point to kernel, sometimes you may need some information, which can only be provided with kernel space credentials, so Kernel developers give you some system call. Although user space doesn't use it directly(in general), they uses some API for the same.

So the flow will go like

read() ----> read() wrapper ------> system_call() ------> sys_read()

Here first two thing is in user space and last two is in kernel space.

How do they work ?

- Application program calls the API.

- A system library routine is called first.

- It transforms the call to the system standard and traps to the kernel.

- Control is taken by the kernel running in the system mode.

- According to the code, the call dispatcher invokes the responsible handler.

- Interrupt is disable during this handler.

- After call is finished, mode is changed from system mode to user mode and calling process execution resumes.

How it works at low level ?

- So whenever you call any system call, it is translated to SWI .

- Whenever processor sees SWI, it interrupts whatever is running currently( except IRQ/FIQ), changes the mode to supervisor mode(SVC).

- Whatever parameters were there in the system call, are passed using registers.

- Also PC is stored in LR of svc mode and CPSR is saved in SPSR_svc, so that we can recover the previous state while returning from SWI.

- On execution of SWI, processor looks for SWI_handler address, which was already defined in the vector table.

- From this SWI handler, processor jumps to the specific system call handler. Here we use the system call number to get the corresponding handler.

- Once the system call handler finishes its execution. Processor change the mode to user, and recover PC from LR.

Type of System Call :

- Process control and IPC : fork(), exec(), pipe()

- Memory Management : malloc(), free()

- File and File System management : open(), read(), write()

- Device Management : ioctl()

- Others : kill(), signal()

Implementation of System Call :

For this you can refer my previous blog on creating system call.

Friday, September 19, 2014

Add a Custom System Call in Linux Kernel

Here I will be covering a small tutorial on creating new system call in Linux Kernel. I am using Latest Kernel version 3.16 ( It is latest at the time of blog written).

I added my call for 32 bit system only.

So the whole process is divided in 5 step.

Step 1: Open arch/x86/syscalls/syscall_32.tbl . Here go to the last line in the file. It will be containing a number in first column, this number tells that it is the last number used by system for system call. So lets say the number is 356, so your new system call will have number 357. Now just duplicate the last line, and change the number and name of the sys call. Let say it is "hello". So the whole line will look like :

357 i386 hello sys_hello

Step 2: Add the syntax of syscall in include/linux/syscalls.h. Suppose this sys call takes 2 int parameters. So for syscall "hello", your new line should be like this :

asmlinkage long sys_hello(int a, int b);

Step 3: Now add the entry in /kernel/sys_ni.c. So entry will be like :

cond_syscall(sys_hello);

Step 4: Add the function definition for sys call. Open kernel/sys.c. you can add it at different place too. Now as our sys call is having two parameters, so the function will look like this :

SYSCALL_DEFINE2(hello /*name of syscall */, int /*type of first parameter */, a /*name of first parameter*/, int /*type of second parameter */, a /*name of second parameter*/)

{

int error = -EINVAL;

// code for whatever you want to do in syscall

return 0;

}

That's it, your system call is created, but to reflect it in your kernel, you have to build it. you can refer my blog

Step 5: Test the system call. create a userspace program. Don't forget to add sys/syscall.h header.

Now to call our hello syscall.

int call = syscall(357, 1,2); // here 357 is our system call number, 1 is val for a and 2 for b.

That will be all from my side. I hope it help you to understand syscall.

Wednesday, July 30, 2014

Insertion Sort : In non increasing format

Here is the pseudo code for arranging number in non increasing format using Insertion sort. This is one of the exercise of Book Introduction to Algorithms by CLRS.

//N is array's length
for j = N-1 to 1
key = A[j];
i = j + 1;

while i<=N && A[i] > key
A[i-1] = A[i];
i = i +1;
A[i-1] = key;

Input : 5 4 3 2 6
Output : 5 4 3 6 2
5 4 6 3 2
5 6 4 3 2
6 5 4 3 2 [Final]

Thursday, July 24, 2014

Short Notes on Memory in Linux

Most of part of this blog will be containing notes from Robert Love's Linux Kernel Development. But I will try to add the general questions also in this blog.

The most basic unit of memory is page. Kernel use page structure to keep track of all the pages in the system. Through this kernel can be informed that whether page is used or free, if used then who is using it etc.

Zone : Because of hardware limitation the kernel can't treat all pages identical. So it is divided in 3 parts mostly : ZONE_DMA(upto 16 MB), ZONE_NORMAL(16 to 896 MB) and ZONE_HIGHMEM(896 & above).

Main limitation to divide pages in Zone are:
1. Some hardware devices can perform DMA to only certain memory address.
2. Some memory can't be permanently mapped into kernel address space.(eg: HIGMEM)

Page allocation interface :

1. struct page * alloc_pages(gfp_t gfp_mask, unsigned int order) : This returns 2 to the power order pages contiguous physical page. As this function returns a page pointer, to get the logical address of it, we use page_address(struct page * page) method.
2. unsigned long __get_free_pages(gft_t gfp_mask, unsigned int order) : it returns the logical address.
3. unsigned long get_zeroed_pages(gfp_mask) : it is useful for page allocated for user space, as it zeroes the content of allocated page, so that sensitive data doesn't pass to user space.
4. void * kmalloc(size_t size, gfp_t flags): this function returns a pointer to a region of memory that is atleast size bytes in length. This memory is physically contiguous.
5. vamlloc() : it allocates virtually contiguous memory. It does this by allocating potentially non contiguous chunks of physical memory and fixing the page table to map the memory into a contiguous chunk of virtual memory.

Deallocation interface :

1. __free_pages(struct page *page, unsigned int order) : Always ensure you are deallocating only those page which you allocate.
2. kfree(const void *ptr)
3. vfree(const void *ptr)

Introduction to Slab Layer:
Allocation and deallocation memory is the most common operation in Kernel. Because of it, there is good chance of de-fragmentation. Which is complete waste of resources, as you are having the free memory but you cant use it as it is not contiguous.
To resolve this issue, Linux kernel came up with the slab layer concept. So most frequently used data structure is allocated through cache. At the start-up we create the caches for all major data structure. Cache is further divided in slab. And slab contains a page. Apart from that Cache also maintain 3 list, empty list, full list and partial list.
So whenever allocation request comes, Cache check whether any empty page is there on any of the list. It allocates there and then return the page. Once that structure is released, the corresponding page is also released and return back to the free list.

This was the basics, now lets try with questions which I faced, these questions may not in order of relevance:

1. What is the return address of kmalloc() ? Physical or Virtual?
Ans : it always return the virtual address, and the allocated memory region will be physically as well as virtually contiguous.

2. How memory is allocated for program ?
Ans : When ever program compilation is done. At the time of loading, First program is loaded in memory, then mapped it to the Virtual memory. So it will be having page table with no association with physical memory. At the time of use, means when you want some memory for operation, it goes to page table, there MMU find that no physical memory is allocated, so it do the page_fault(). Then the physical memory is allocated and corresponding entry is created in the page table.

3. logical address and virtual address ?
Ans : Most important thing is logical address comes in picture if you have segmentation unit in your system. If it is not there then there is no logical address. So the address translation happens like this :
logical address => [segmentation unit] => virtual address => [paging unit] => physical address

4. Why physically contiguous memory region is more efficient than virtually contiguous memory?

Ans : As memory allocated is physically contiguous can use a concept called huge pages. With this higher page size can be used, so correspondingly there will be lesser page entries in the table.Huge pages can improve performance through reduced page faults (a single fault brings in a large chunk of memory at once) and by reducing the cost of virtual to physical address translation (fewer levels of page tables must be traversed to get to the physical address).

5. How Shared memory is used?
Ans : For shared memory we use IPC like shmem. So whenever two process wants to share some memory. They use shmem IPC to get an id for the associated memory, further this shared memory is mapped to both processes's address space. So for those process it looks like local memory. Internally the vm_area_struct (virtual memory area) uses VM_SHARED flag to show this memory area as shared memory.
Also when we create a child process with CLONE_VM flag, at the time of creation it skips the allocate_mm() call (which is actually responsible for memory allocation) and assign it's mm structure to its parent's mm structure.

6. Where does the memory is allocated for kernel stack ?
Ans : Kernel stack is allocated in the kernel space, remember not is user space memory. But it is mapped to that Process's address space, not only that but to all other process's address space. As apart from kernel no body will be using this, so it no process is able to recognize it and they don't have permission to access it.

7. Can a process use whole 4 GB address ?
Ans : No, it can't. Remember, memory is divided in memory area. So a process can only access only those memory areas for which it has permission. Even to add or remove memory area, it has to request kernel.

Sunday, June 29, 2014

How to Submit android app to Amazon app store

If you are new to android development and specifically planning to launch your app to Amazon app store. Then you must be feeling a little bit left behind, because there you won't find much over net apart from Amazon's own material.

Although it is mentioned on their site clearly, I thought of repeating it, because it will just some one else to find the right thing :

So if you want to add the ad api so here is the way :

1. Download sdk from here.

2. Now add the Jar in Eclipse using Install New Software. After that add this link as name Amazon.

3. Now Import your android project, remove all the google play service api from it, and add Amazon api. To do that right click on your project, and click on Properties. And then in that click on Amazon Mobile app SDK.

4. There you can select whatever you want to use, Mobile Ads, ADM, In-App purchasing etc.

5. Now it comes to the coding part. Wherever you want to use it, include the following code :

AdRegistration.setAppKey("37de83fd21234ed4811e241507876542F"); // put your Application id

this.adView = (AdLayout) findViewById(R.id.adview);

AdTargetingOptions adOptions = new AdTargetingOptions().enableGeoLocation(true);

// Optional: Set ad targeting options here.

this.adView.loadAd(adOptions); // Retrieves an ad on background thread

And you have to create corresponding entry in xml :

android:id="@+id/adview"

android:layout_width="match_parent"

android:layout_height="wrap_content"/>

6. And you are good to go.

7. Use Test your app option on Amazon before uploading your apk to Amazon app store.

I hope this blog help you to publish your first App on Amazon app store.

Thursday, June 26, 2014

Android : How to Check whether device is rooted or not ?

Many time user go for rooting their phone to break the constraints put by operator and OEMs.

So Rooting your phone opens a door of new features which could be really useful.

To root your phone, I will suggest you to go to Xda developers. They have wide range of tutorial specifically for each devices. So once you have followed their steps, and just wanted to confirm whether your device has been rooted, Here is a simple app called Root Verifier Pro Launched by Bitdroid Devz.

I will suggest you to try it, it worked fine for me.

Thursday, June 19, 2014

HWComposer : Android Surface Composition Logic Module

After my last post of SurfaceFlinger few people asked me to write on HWcomposer, which is tightly associated with SurfaceFlinger.
I here would like to mention that Google's Graphics Architecture link helped me a lot. So I will be mentioning many of those points here again.
So here we go

HWComposer(HWC)'s primary purpose is to determing the most efficient way to composite buffers with the available hardware.
You can easily recognize the value of HWC, when you consider "overlay planes". The purpose of overlay planes is to composite the multiple buffers together, but in the display hardware(MDP in case of QCom) rather than the GPU. You can refer this link for diagram.

The mentioned link describes it very well, it goes like :
In General you have three buffers for Android screen at a time, one for Status bar, second for Navigation bar and third for Application. So instead of compositing all three buffers, SurfaceFlinger pass these buffers to Display Hardware to process. So HWC works like this :

SurfaceFlinger provides the HWC with a full list of layers, and asks, "how do you want to handle this?"

The HWC responds by marking each layer as "overlay" or "GLES composition."

SurfaceFlinger takes care of any GLES composition, passing the output buffer to HWC, and lets HWC handle the rest.

Now as mentioned in 3rd point, Surfaceflinger passes the output buffer to HWC. This is quite useful when there is no update in the screen. In this scenario, HWC can choose GLES composition for those buffer. Because of this, When Surfaceflinger comes up with the same set of buffers(as no screen is changed), HWC can just continue to show the previously composite buffer.

Code Flow:

Linking of hwc module :

SurfaceFlinger::init() -> HWComposer() -> loadHWCModule() -> hwc_open_1 -> (module->methods-->open) ->open() [defined in hwc.cpp]

Now whenever Surfaceflinger have new buffers to update, it calls setupHWComposer(). This function is used to get all layers, using
currentLayers = hw->getVisibleLayersSortedByZ();

after setting the data, it calls prepare() which in turns call prepare()[hwc.cpp][I will definde the details of this function later].

After all processing and composition is done, we call postFrameBuffer(), it calls commit() and finally hwc::set(). Which goes like this :

hwc::set() -> hwc_set_primary() -> (mFbUpdate->draw())* -> ov.queueBuffer() -> GenericPipe::queueBuffer() -> Data:queueBuffer()(OverlayCtrlData.h) -> MdpData:play()(overlaymdp.h) -> play()(MdpWrapper.h) ->ioctl(fd,MSMFB_OVERLAY_PLAY,offset)

this fd and offset it is getting from
fbLayer = &list->hwLayers[last];
hnd = (private_handle_t *) fbLayer->handle;

after (mFbUpdate->draw()) calls finishes, flow goes like
(Overlay->displayCommit()) -> displayCommit()(Overlay.cpp) -> displayCommit()(MdpWrapper.h) -> ioctl(fd, MSMFB_OVERLAY_COMMIT, info)

After this lcd driver starts doing the final task.

Now lets understand what prepare() function do.

prepare is called for each frame before composition and is used by SurfaceFlinger to determine what composition steps the HWC can handle.
The HWC responds by setting the compositionType field in each layer to either HWC_FRAMEBUFFER or HWC_OVERLAY.
In case of HWC_FRAMEBUFFER, composition is handled by SurfaceFlinger using OpenGL ES, in later case HWC will have to handle the layer's composition.

Now other important function in hwc.cpp is set(). When this call returns the caller assumes that the display will be updated in the near future with the content of their work list.

Wednesday, June 11, 2014

difference between Kernel Thread and User thread

1. Kernel threads run only in Kernel Mode, while regular processes run alternatively in Kernel Mode and in User Mode.

2. Because kernel threads run only in Kernel Mode, they use only linear addresses greater than PAGE_OFFSET. Regular processes, on the other hand, use all four gigabytes of linear addresses, in either User Mode or Kernel Mode.

Process 0 :
This is the ancestor of all processes, also known as the idle process. it is a kernel thread created from scratch during the intialization phase of linux. This ancestor proces uses the following statically allocated data structures.

1. A process descriptor stored in the init_task variable, initialized by the INIT_TASK macro.
2. A thread_info descriptor and a Kernel Mode stack stored in the init_thread_union variable and initialized by the INIT_THREAD_INFO macro.
3. The following tables, which is the process descriptor points to:

- init_mm ( initialized by INIT_MM)
- init_fs (initlialized by INIT_FS)
- init_files (initlialized by INIT_FILES)
- init_signals(initialized by INIT_SIGNALS)
- init_sighand(initialized by INIT_SIGHAND)

4. The master kernel Page Global Directory stored in swapper_pg_dir.

The start_kernel() function intializes all the data structures needed by the kernel enables interrupts, and creates another kernel thread, named process 1 (known as init process).

After creating init process, Process 0 is selected by scheduler only when there are no other processes in the TASK_RUNNING state.

For multiprocessor system, there is a process 0 for each CPU.

Monday, June 2, 2014

Sunday, June 1, 2014

Generate all combination of elements of an array : Power Set Algorithm

#include

#define MAX 100010

using namespace std;

int arr[MAX];

void powerset(int arr[], vector v, int start, int end)

{

if(start > end)

return;

v.push_back(arr[start]);

vector::iterator i;

for(i = v.begin(); i

{

cout << *i;

}

cout << "\n";

powerset(arr,v,start+1,end);

v.pop_back();

powerset(arr,v,start+1,end);

}

int main()

{

int T,N;

cin >> T;

while(T-- > 0)

{

cin >> N;

for(int i=0;i

cin >> arr[i];

vector v;

powerset(arr,v,0,N-1);

}

return 0;

}

Wednesday, May 28, 2014

Notes on Insertion Sort

Insertion Sort is a way of sorting elements. While using insertion sort, we traverse from right to left element.

Steps :

1. Assume 0th element is already sorted.
2. Starting i from first element till last :
- In a loop(j ), check if the previous element is greater than the comparing element( for example for first time, we will be comparing 1st element to 0th element.)
- If previous element is greater than comparing element, we will replace the j+1th element with j.[Remember before doing this keep array[i] stored in some variable, as it will be compared to all elements].
- when loops finished, just replace the last index element with array[i] stored value.

Code :

for(i =1; i{
int val = arr[i];
int j = i-1;

while( j >=0 && A[j] > val)
{
A[j+1] = A[j];
j--;
}

A[j] = val;
}

Wednesday, April 23, 2014

kmalloc() V/S other page allocation methods

If you are developing anthing in Kernel, I hope you have used kmalloc() many times till now. But if you see there are other function also to allocate physical memory. Those are :

__get_free_page(unsigned int gfp_mask)

__get_free_pages(unsinged int gfp_mask, int order)

get_zeroed_page(unsigned int gfp_mask)

So when we use kmalloc() and when we use the above functions ?

Answer lies in your requirement, if you need very small memory to allocate, go for kmalloc, because there you define the minimum no of bytes(let say n) you need to allocate. But remember it just ensure that atleast n bytes will be allocated, but in reality, it allocates more than it, sometimes even double of the size. So in short there is no fine granuality with kmalloc().

while get_free_page family is used when you want to allocate memory in terms of pages, as for kernel, page is the smallest memory unit. So you will pass the order x, and it will return you 2 power x pages. So here Kernel ensures that only requested no of pages will be allocated. So there will not be any memory waste.

And if you want to allocate memory from high memory, then there is a seperate function : [Although these function can be used for normal zones too]

alloc_pages(unsinged int gfp_mask, unsigned int order)

alloc_pages(unsinged int gfp_mask)

Sunday, April 20, 2014

Short Note on Android App Signing

I know you will find many articles for mentioned title. But I thought of summarizing it , So Here are the steps :

1. You need Keytool and Jarsigner. If you have java installed in your system, then you have these utilities.

2. Now generate a private key for your app, So here is the command :
keytool -genkey -v -keystore my-release-key.keystore -alias ANY_NAME -keyalg RSA -keysize 2048 -validity 10000

This will ask you few basic question, answer those. finally it will ask the password, give it, this will encrypt your keystore file.

3. I hope as you are an android developer, you must be having Eclipse with you. So Go to File -> export -> Android -> Export Android Application.

4. Select the project from which you want to export the application.

5. Select keystore, remember in step 2 you created my-release-key.keystore, give path to this file. And entered the password you have entered at the time of creation.

6. Next it will ask key alias selection, So whatever ANY_NAME you given in step2, choose that and again give the password.

7. Finally it will ask you the location, where the signed application will be exported. So just give the path and click finish.

And you are done.

Monday, April 14, 2014

Create a structure which can contain any type of data

Generally whenever we create a structure, we define that what kind of data it will be storing. But think it this way, what if you don't know what kind of data structure it can use in future, to avoid that situation. Here is the way :

#include
#include
struct node
{
void *ptr;
struct node *link;
};
int main(void) {
// your code goes here
struct node *head,*temp;
float var = 10.5f;
head = (struct node *)malloc(sizeof(struct node));
*(int *)(head->ptr) =10;
head->link = NULL;

temp = (struct node *)malloc(sizeof(struct node));
*(float *)(temp->ptr) = var;
temp->link = NULL;

head->link = temp;

printf("Value = %d",*(int *)(head->ptr));
printf("Value = %f",*(float *)(temp->ptr));
return 0;
}

Hope it helps you in your next interview :)

Friday, March 21, 2014

Mutex Vs Binary Semaphore

Most of the time people get confused with these two. So here I will try to explain the difference using this answer at Stack Overflow.

So Mutex is a lock which have the owner association, So it means whoever holds the mutex lock will only be able to unlock this mutex. So Mutex is used when you want to lock the resources.

While Binary Semaphore is used for signalling mechanism, It doesn't lock the resource. So any other thread can also unlock the semaphore held by your thread. In other words it signals your thread to do something.

Thursday, March 20, 2014

GDB : Debugger for simple c file

Hi,

It may be out of track article for kernel, but it could be good if you have single file.
So GDB is a debugger utility in linux for debugging C/ C++ program.

Step 1 : So first how you will enable your program to do so :
In your compilation command, use -g option, for example :
gcc -g program.c -o program

Step 2 : Now to run the utility, just type gdb on the shell. It will give you something like (gdb) shell

Step 3: Load the file which you want to debug,
(gdb) file filename.c

Step 4 : Put break points at lines
(gdb) break filename.c : Line number
Put break points at function
(gdb) break func_name

Step 5 : type run. It will run till it reaches your first break point.

Step 6 : Now suppose you want to check the value of some variable use print
(gdb) print x

Step 7 : If you want to just go to next break point, just type
(gdb) continue

Step 8 : If you want to go step by step, use either next/step command.
(gdb) next
or
(gdb) step
Only difference is, next will treat any sub-routine call as a single instruction.

Now apart from break point there is one more concept as Watch point, it works on variable instead of function or line number. It will stop code execution when the value of variable on which watchpoint is set, has been changed. Syntax is
(gdb) watch variable_name

There are other command also

1. backtrace - produces a stack trace of the function calls that lead to a seg fault (should remind you of Java exceptions)
2. where - same as backtrace; you can think of this version as working even when you’re still in the middle of the program
3. finish - runs until the current function is ﬁnished
4. delete - deletes a speciﬁed breakpoint
5. info breakpoints - shows information about all declared
breakpoints

Also we can use condition in breakpoints, for example you want to break only when i < 1,so do like this :
(gdb) break program.c : 7 if i < 1.

Tuesday, March 18, 2014

Build kernel from source code

If you are newbie to the kernel, It is a matter of time when you have to download kernel source and build it. Although working in Linux kernel domain for 2 years, I never did it earlier, so finally today I did it because of eudyptula-challenge.org.

So here are the steps, you should do to build your kernel.

Step 1 : Download the source code from git.

Step 2 : Set up the config file. It is the file which tells at the time of booting what all configuration needed to be enabled. So easiest way to create config file sudo make localmodconfig

Step 3 : Further if you want to modify any change in config file, use sudo make menuconfig. Now change any configuration you want, you can select y/n/m.

Step 4 : Now its time to build your kernel, you can use either make or you can use make deb-pkg

Step 5 : To install the image,

- If you used make earlier, then call make install.

- If you used make deb-pkg, then install the image first:

sudo dpkg -i linux-image-3.14.0-rc6-00145-ga4ecdf8_3.14.0-rc6-00145-ga4ecdf8-8_i386.deb
Then install the headers :sudo dpkg -i linux-headers-3.14.0-rc6-00145-ga4ecdf8_3.14.0-rc6-00145-ga4ecdf8-8_i386.deb

you can replace the version accordingly in above commands.

After reboot, you will be able to see the new linux kernel in your grub list.

ARM 3-Stage Pipeline

What is Pipeline?
Pipeline is used to improve the perforamnce of the overall system by allowing multiple instructions(which are in different stage) parallely. So first understand the 3 stages of pipeline :

Fetch--------> Decode -----------> Execute

As per their names, In Fetch, we fetch the instructions, in Decode we decode the instruction and finally execure in Execute stage.

So keeping these 3 stages in mind, Whenever instuction 1 is in Executing stage(as it started first), instruction 2 will be in Decoding stage and instruction 3 will be in Fetching stage.

So if you see this, then there will not be any improvement for a single instruction, as every instruction will be taking the same 3 cycles to execute. But the overall performance will be improved.

Problem with pipeline:
As like all other good things in the world, pipeline also have it share of cons. Pipeline creates problem when there is a branch instruction, because whenever branch instruction is in executing stage, It has to go to some other instruction instead of the instruction which processor had taken in fetch/decode stage while branch instruction was in fetch/decoding stage. I know it may be confusing, so lets understand this with an example:

[As I dont know assembly language, instruction syntax can be different what you expect]
1. ADD R1,R2,R3 [Add R2, R3 and keep it in R1]
2. ADD R2,R3,R4
3. SUB R3, R4,R5
4. JMP X
5. ADD R2,R3,R4
6. SUB R3, R4,R5
7. X:
8. ADD R1,R2,R3

So you can see when instruction 4(JMP) will be in execution stage, instruction 5 will be in decode stage and instruction 6 in fetch stage. But as isntruction 4 is a jump statement, thee is no use of executing instruction 5 and 6.
In that case pipeline will be flush, and next 2 cycle will be waste. As for instruction 7 to execute, you will need 2 cycle, and as there is no eligible instruction to execute in pipeline, there will not be any output during this cycle.

So it is not necessary that pipeline will always improve performance.

Branch Prediction:
To overcome this problem caused by pipeline, Architecture came up with branch prediction. It tries to predict which could be executed next, . There are different ways to achieve this, I am not covering those here. you can refer wikipedia or ARM information center.

Sunday, March 16, 2014

Cache Management in ARM

What is Cache?

Cache is a place where a processor can put data and instruction.

Why we need Cache?

As we know accessing memory to fetch instructions and data is way slower than processor clock, so fetching any instruction from the memory can take multiple clock cycle. To improve this scenario, ARM provides the concept of Cache, it is a small, fast block of memory which sits between processor core and memory. It holds copies of items in main memory. That way we have kept one intermediate small memory which is faster than main memory.

How it works?

If we have cache in our system, there must be a cahce controller. Cache controller is a hardware which manages cache without the knowledge of programmer. So whenever processor wants to read or write anything, first it will check it in the cache, this is called cache lookup. If it is there it will return the result to processor. If required instruction/data is not there, it is known as cache miss. And request will be forwarded to main memory.
Apart from checking whether data/instruction existence in the cache, there is one more thing in core, which is a write buffer. Write buffer is used to save further time for processor. So suppose processor wants to execute a store instruction, what it can do is, it can put the relevant information such as which location to write, which buffer needs to copy, and what is the size of buffer which is getting copied into this write buffer, after that processor is free to do take next instruction, Write buffer will then put this data into the memory.

Level of Cache:
There are generally two caches(L1 and L2), some architecture have only one cahce(L1) too.
L1 caches are typically connected directly to the core logic that fetches instructions and handles load and store instructions. These are Harvard caches, that is, there are separate caches for instructions and for data that effectively appear as part of the core. Generally these have very small size 16KB or 32KB. This size depends on the capability of providing single cycle access at a core speed of 1GHz or
more.
Other is L2 cache, these are larger in size, but slower and unified in nature. Unified because it uses single cahce for instructions and data. L2 cache could be part of core or be a external block.

Cache Coherency:
As every core has its own L1 cache, It needs a mechanism to maintain the coherency between all the caches, because if one cache is updated and other is not, This will create data inconsistency.
There are 2 ways to solve this scenario:

1. Hardware manage coherency : It is the most efficient solution. So the data which is shared among caches will always be updated. Everything in that sharing domain will always contain the same exact value for that share data.
2. Software manager coherency : Here the software, usually device drivers, must clean or flush dirty data from caches, and invalidate old data to enable sharing with other processors or masters in the system. This takes processor cycles, bus bandwidth, and power.

Wednesday, March 12, 2014

why to use request_threaded_irq?

I know for many this function may be very clear, but I found it really difficult to digest this function. So lets start it :

Why we need it ?
From the starting, it was desired for kernel to reduce the time for which processor stays in interrupt context. To solve this initially they introduced Top half and bottom half concept, in which they keep time critical task in top half and rest of the work they kept in bottom half. When processor is executing the interrupt handler it is in interrupt context with interrupts disabled on that line, which is not good because if it is a shared line, other interrupts won't be handled during this time, which in turn effect the overall system latency.
To overcome this problem, kernel developers came up with the request_threaded_irq() method which futher reduces the time.

How it works?
Before going further with the functionality, lets check the function definition first

`int request_threaded_irq (`	unsigned int `irq`,
	irq_handler_t `handler`,
	irq_handler_t `thread_fn`,
	unsigned long `irqflags`,
	const char * `devname`,
	void * `dev_id)`;

Difference between this function and usual request_irq function is the extra thread_fn.

Now lets understand the functionality, request_threaded_irq() breaks handler code in two parts, 1st handler and 2nd thread function. Now main functionality of handler is to intimate Hardware that it has received the interrupt and wake up thread function. As soon as handler finishes, processor is in process context.

so processor is free to receive new interrupts. It improves the overall latency.

Misc:

Some driver code uses request_threaded_irq() with NULL as a value for handler, in that scenario kernel will invoke the default handler, which simply wakeup the thread function.

When to use request_threaded_irq instead of bottom halves ?

Answer lies in the driver's requirement, if it wants to sleep put the code in thread fn and user threaded function.

Thursday, March 6, 2014

DMA : Direct Memory Access

Why we use it ?

In normal scenario whenever data transfer is done, processor should be informed. If there is more data, it could lead a lot of time of CPU. To overcome this issue, DMA concept came in picture, using DMA, computer can access system memory without telling Processor.

Earlier when DMA was not there, CPU will be busy when transfer is happening, but now with DMA, CPU can do other operations while transfer is happening.

Misc:

DMA has its configuration, which describe what it will be transfering, from where it will be taking data. what is the size? There are many other settings too. There will be one internal buffer also, which will be taking data from source and then putting it at destination.

When we want to start the transfer, there will be one bit associated, when you write it, it will intiate the transfer.

Thursday, February 27, 2014

I2C Communicatiom

I2C (Inter-Integrated Circuit) protocol is used to transfer data between Master and Slave device. There are 2 lines between master and slave, one for data and other for clock.

I2C is synchronous transfer mechanism, as it uses clock to control the data. So for a particular transfer, Master/ Slave will be transferring/recieving at the same rate.

Master is known as master, because only it is who can initiate the transfer.

There are two type in which you can register a slave device, either it is connected to interrupt controller, or through GPIO lines you can use it. When registering to interrupt controller we use "i2c" while in other case we use it as "i2c-gpio".

In device side, we define i2c_board_info for every slave, this will containg slave name, slave address, also platform data. Futher this slave will be register using i2c_register_board_info(), in this method we will pass the bus number and the slave.

Flow will differ if we are using gpio, there we will create device as name "i2c-gpio" and bus number, it will pass this device in platform_add_devices() to get it registered.

At driver side, we will be define all functions like probe, remove, NAME(it should be same), pm_ops. Finally we will call i2c_add_driver(). So as soon as linking between device and driver happens, it will call probe function.

Now lets take example of Touch, there whenever user touches screen, it will generate the interrupt(it was registered in probe). Through this only Master(touch screen driver) will know that there is some data available at the slave side which needs to be read. Then it will initiate the connection and read data one by one. same is applicale for reading, a little bit flow changes for it.

Monday, February 24, 2014

Some important Terms useful for Display Driver Engineers.

As being part of display driver team, most of the time I need to refresh few of the terms used in display prcessor.

so here are few of those, I will try to keep on adding here as I encouter any of the term : [I will try to give you simplest meaning for fast understanding, you can refer wikipedia or any other source for more details]

1. Tiling : It is like generating a larger graphics from re-using a number of smaller graphics to save RAM and increase real-time rendering performance. For example if there is picture of grass field, so you can only store of a portion of that field, and while displaying you can re-use the same portion again and again to create full scale image.

2. Gamma Correction : Human vision, under common illumination conditions (not pitch black nor blindingly bright), follows an approximate gamma or power function. If images are not gamma encoded, they allocate too many bits or too much bandwidth to highlights that humans cannot differentiate, and too few bits/bandwidth to shadow values that humans are sensitive to and would require more bits/bandwidth to maintain the same visual quality. So with this correction, we define a linear function, which coverts the image to compatible image gamma corrected image.

3. Dithering : Dithering is used in computer graphics to create the illusion of "color depth" in images with a limited color palette - a technique (also known as color quantization). In a dithered image, colors that are not available in the palette are approximated by a diffusion of colored pixels from within the available palette.

So in simple terms using it, we create new colors which is not availble in the color palette(set of colors).

4. Histogram: Its graphical representation of the distribution of the intensity(brightness perception of a color for humans) of the pixels present in a picture. On the basis of this histogram, further brightness of every pixel can be modified to improve the end result.

5. Alpha Blending: Alpha blending is the process of combining a translucent foreground color with a background color, thereby producing a new blended color. So it foreground is transparent then final result will be background color, if foreground is opaque, final result will be foreground color only.

6. Color Keying: This technique is used for compositing two images together based on color hues. It is used heavily in many fields to remove a background from the subject of a pahoto or video.

Tuesday, February 18, 2014

Memory Management in Linux

This post is going to be little vague. As I will not be explaining all the basics which most of the book of the same topic covers.

Lets talk about kernel memory allocation Most of the time we use kmalloc() function. Although it is supposed to be used only when kernel need contiguous physical memory. But to save overhead caused by vmalloc(), most of the time kmalloc() is preffered.

So How it allocates ?
Before that first we should understand the hierarchy, on top kmalloc() call is there, then slab allocator comes and finally buddy allocator which finally interacts with the physical memory. So at the time of boot up, slab allocator asks buddy allocator to return a fixed no of pages. Further slab allocator will divide those pages in different size of Caches(like 32, 64 ...MAX byte).
Now suppose in your kernel code you want to allocate a memory for int using kmalloc(), it will search the position for this data only in 32 bit slab. Further every slab has 3 lists, full, partial and empty. These list will have pages. So whenever it wants to allocate a memory for any object, it will first check if any partial page available for that slab, if yes it will allocate memory in that page, if not it will check for free pages, if there also no page is available it will ask buddy allocator to give new page.

Use of High Memory:
It is used when you have more physical memory than required virtual memory. For example, for 4GB virtual memory and 1.5 GB RAM, you will have 1 GB for kernel and 3GB for user space. So you have more physical memory (1.5 GB) than virtual memory(1 GB).
So there is no one to one mapping exists between physical and virtual memory, to solve this problem, in x86 architecture, they created one-to-one mapping for upto 896 MB, memory more than that will be dynamically mapped, so for high memory page will have NULL as a virtual address in page structure.

Tuesday, February 11, 2014

Semaphore

As we discussed in previous post, Spinlock is spinning lock, the same way Semaphore is sleeping lock.
So think of a scenario in which a task want to acquire a semaphore which is not available, so it will go to sleep and semaphore will put it on a wait queue. Once this semaphore becomes available, scheduler will invoke any of the task on this waitqueue.

Properties:
1. As the task holding semaphore can sleep, it is best suitable for the task which runs for longer duration.
2. As the task holding semaphore can sleep, it can only be used in Process Context.
3. Task cant hold a spinlock if it wants to acquire a semaphore. Because task might want to sleep while waiting for semaphore, but this will cause processor to sleep for infinte time(if it is holding spinlock), as there is no one to wake it up.

Monday, February 10, 2014

Spin lock

Lets start with spin lock. First we will understand what is it and further I will try to cover few doubts about it, which could be quite useful in Interviews.

What is Spin lock :
Spin lock is a lock that can be held by at most one thread of execution. Suppose one thread is trying to acquire a spin lock, so first it will check whether this spinlock is held by other thread or not, if it is available it will acquire the lock otherwise it will constantly keep on checking (busy looping) for this lock.

Properties :
1. It should be held for small duration only, because during the time a spin lock is held no other thread can be scheduled on that processor.
2. On uni processor, if spin lock is held, it means it will now behave as kernel preemption is disable. As no other thread can run.
3. On Multi processor, no other thread will be able to take this spin lock.
4. If interrupt handler needs to use lock, they can use spinlock. Also we must ensure that local interrupts are disable before obtaining the lock. Otherwise, it is possible for an interrupt handler to interrupt the kernel code while lock is held. This in turn will make interrupt handler to spins, waiting for the lock to become available while lock holder thread will wait interrupt handler to complete. This is deadlock.

Now another scenario, Suppose you disabled interrupt on local processor and a process is holding the lock. At the same time interrupt came on other processor, it will make process to release the spinlock.

Wednesday, February 5, 2014

Why Interrupts can't sleep ? And Nested Interrupt

So Lets start with a simple and very important question, why Interrupt can't sleep.

First go back to our understanding to User Context, there we can sleep why? because when you sleep in it, you can call schedule() which can ask other processes to run.

But when you are in Interrupt context, the only thing residing on that Processor is the Interrupt handler(of the respective interrupt). So if you sleep now, there is nothing else to schedule. So its like processor itself is in sleep which is not good at all.

Nested Interrupt

Now there are few things associated with Interrupt, like nested interrupts. So Nested interrupt is when you are running and interrupt handler and more interrupts come on the same line. So in interrupt handler you are getting another interrupt.

When this situation arises ?
Suppose for some reason you need to call enable_irq() in interrupt handler, so it means you are telling your hardware that you are ready to receive the new interrupt.

How Kernel Handles it ?
For this situation, Kernel have third mode called System Mode( apart from user and kernel mode). This is also a privileged mode. So if you want to enable nested interrupt you should change mode to system, and then you should enable interrupt. Before enabling interrupt, you keep LR and SPSR of IRQ mode in Stack of System mode.
This way we save LR and SPSR from being corrupted. And once the new interrupt handled it can return to the previous interrupt and finally it can go back to original location.
you can refer this link for more clarity.

Thursday, January 16, 2014

SurfaceFlinger Basics

Thank you Guys, For such a great response for my last post on SurfaceFlinger. Here I am trying to add few more things which will help you clear your understanding on SurfaceFlinger.

Terms User in SurfaceFlinger:

Canvas : 2D drawing object, object which used to draw on screen bitmap.
Surface : Drawing Buffer
Window : A visual area containing some kind of user interface.
SurfaceFlinger : Surface Manager
View : This is user interface, it’s a rectangular area on the screen for drawing and event handling.
ViewGroup : Subclass of view that contain other views(called children). It is the base class for layouts which are invisible containers that hold other views or viewgroups.

How Views are Created:

When view is ready to be drawn it will inform using requestLayout() to the parent to schedule a traversal.
Which is turn will traversal up till ViewRoot.
scheduleTraversal() will post message to UI thread to start performTraversal().
performTraversal() does 3 things.

Measure Pass : It is called to find out how big a view should be and how it should be measured and positioned.
Layout Pass : Each parent will position all of its children using the sizes computed in the measure pass.
Deawing : ViewRoot will get the canvas by calling surface.lockCanvas(), it updates the canvas by passing it to view.drawCanvas().

After drawing the ViewRoot will pass the canvas to surface by surface.unlockCanvasAndPost().

As mentioned earlier, whenever there is a change in view, relayoutWindow()[ViewRootImpl.java] will be called. Further flows will be like

relayoutWindow()[ViewrootImpl.java] ---> relayoutWindow()[WindowManagerService.java]------

-->createSurfaceLocked()[WindowStateAnimator.java] -----> SurfaceControl()[SurfaceControl.java]---

---> nativeCreate()[android_view_SurfaceControl.cpp] -------> createSurface()[Client.cpp] --------

--> createLayer()[SurfaceFlinger.cpp] --------> createNormalLayer()[SurfaceFlinger.cpp] -----------

--> setBuffer()[Layer.cpp] ------->

setDefaultBufferSize()/ setDefaultBufferFormat()/ setDefaultBufferUsageBits()[BufferQueue.cpp]

How Surface is drawn:

So now as surface has been created, so created surface reference will be returned to ViewRootImpl. Further it will call

draw()[ViewRootImpl.java]----------------------------------------------------------------------------------> drawSoftware()[ViewRootImpl.java] ---------> lockCanvas()[Surface.java]

---------> drawColors()[Canvas.java]

---------> draw()[View.java]

---------> unlockCanvas()[Surface.java]

Here draw() [View.java]draws different things like 1) draw background 2) draw views content 3) draw children(like animation) 4) draw decorations( scrollbars for ex).

lockCanvas() is used to attach the canvas to layer and internally it calls dequeueBuffer() which gets the GraphicBuffer to put the content in it.

And unlockCanvas() is used to detach the canvas from layer and internally it will call queuebuffer() which is used to post the content.

Added one more post on SurfaceFlinger, ,check it out here