Your complete guide to analysing National Sample Survey (NSSO) data

Saturday, 28 November 2015

Your complete guide to analysing National Sample Survey (NSSO) data

NSSO DATA ANALYSIS

Welcome!

Without wasting any of your precious time, we will take you through the process in steps, utilising India's 68th round NSSO as an example. All your queries will be answered within 3 days!

STEP-1 : Understanding the sampling methodology and calculation of sampling weights

Here, we will illustrate the methodology for sub samplewise estimates in rural sector for the consumer expenditure survey of the 68th round of NSSO (2012-13). Let's say we want to see the data for Meghalaya as in Fig 1.

Meghalaya has several districts as can be seen in the map below in Fig-2.

NSSO has a concept of NSS State-Region: An NSS state-region is a contiguous group of districts within a State having similar topography, agro-economic characteristics and population densities. For bigger States, the number of regions goes up to 7 while for smaller States/UTs, there is only one region. The regions have some distinctive geographical features and climatic conditions and this makes the regional estimates more meaningful and useful in some respects. The boundaries of a region generally do not cut across district boundaries or State boundaries.A region may comprise a single district for small States/UTs but for larger States, there may be 10-12 districts in a region.
In our example above, the entire State of Meghalaya has been assigned as only one region viz. NSS-SR= 171. Check out Fig-5 below,

Central Sample & State Samples: In most NSS rounds, survey for the State sample is done by the respective State Govts, while for the Central sample it is done by NSSO. State samples & Central samples are drawn separately or allocated separately.

For rural India, 59,695 households (SSU, explained later) were selected from 7, 469 villages (PSU,explained later) selected from the country were sample surveyed under the Central sample. Subsequently we will discuss only about central samples.

Schedule Type 1 & Type 2 : Type 1 questionnaires based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
Schedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were surveyed separately from the same 7,469 villages under Central sample. We will discuss about Type-1 samples only subsequently.

Stratum: Districts within the NSS-SR are taken as the primary strata. However this is also broken into 2 sectors viz. Urban Sector & Rural Sector. We will discuss only about the rural sector in this example. In our example, we have 7 strata under NSS-SR rural sector , each of them is a distinct district with their geographical boundary.

Sub-Stratum: If the number of households of a district is large, then it is sub-divided into two or more sub-strata of nearly equal households by grouping together contiguous groups of villages having similar socio-economic characteristics.
In our example, the villages has been grouped in West Garo Hills district (Stratum=1, Distcd=1) into 6 sub-strata, in such a way that there are more or less equal number of households in each such group. Check out Fig-5 table above and Fig-3 below as an conceptual illustration,

FSU(First Stage Units): These are villages in the sub-stratum for the rural sector. Here, the Sampling Frame is the list of villages under Census 2011.
USU/SSU(Ultimate/Second Stage Units): These are households available for survey within the village Here, Sampling Frame is all the households available for survey not all the households in the village (please note the distinction, it is important!)

Sub-Rounds: The period of survey is one year duration starting 1st July and ending 30 June (India's Agricultural Year). This survey period is broken into 4 subrounds of 3 months duration as below,

Sub-round 1 : July-Sept

Sub-round 2 : Oct-Dec

Sub-round 3 : Jan-Mar

Sub-round 4 : Apr-Jun

Concept of Interpenetrating Sub-samples: The method consists in drawing samples of FSUs/Villages in the form of 2 villages @8HH per village,thus 16 HHs as ONE subsample and ANOTHER subsample drawn independently in a similar manner . Thus, we have 2 SUBSAMPLES IN EACH SUBSTRATUM. Checkout Fig-6 table below.
The main advantage is that the Relative Standard Error of Estimates (RSE) can be easily found out in this method, even when the sample design is complex.

Second Stage Stratum(SSS): Each household available for survey in the selected FSU/Village is categorised IN EITHER OF THE THREE categories, as the following,illustrated conceptually in Fig-4 below,

SSS 1 Relatively affluent
SSS 2 Of the remaining, having principal earning from non-agricultural sources
SSS 3 All other remaining

Generally 2 households from SSS1 , 4 households from SSS2 & 2 households from SSS3 are selected from the household available for survey sample frame IN THE FSU/VILLAGE as explained above(If some categories are not found then it is made up as per priority rules ensuring ALWAYS that in total 8 households are selected from each FSU/Village)

Fig-7 illustrates the case for only 4 villages for your understanding,

Now we can show you how the sampling weights are calculated !

The meaning of the notations used in NSSO are as below,

Go back to Fig-6

Can you try these now? Find out the relevant formula and check whether it matches the
calculated weight as given on left! If you can then you have definitely understood the
sampling plan and the concept of multipliers/weights in NSSO Unit Level data.

For the first row it is worked out for you as below for calculation of the multiplier for this household, See the above complete working in an Excel file here! => ($V$2/4)*(1/$BG$15)*(BG29/COUNTIF(AR2:AR9,1))

Congratulations!

Thank you I hope you enjoyed understanding the apparently complex stuff as much as I enjoyed in preparing its explanation!

Next, we will look at how to decipher and analyse NSSO data for a given problem.

So for now, good-bye and good luck!.

46 comments:

Anonymous25 November 2015 at 12:10
LOOKING FOWARD,,,
ReplyDelete
Replies
Krishanu Karmakar26 November 2015 at 10:59
Earlier I had tried to understand the sampling process of NSSO step by step, but it was quite laborious and painstaking. Your blog made several things much clearer.
ReplyDelete
Replies
rajib prasad7 December 2015 at 15:15
I am working with NSSO 51_2.2 round data set. I have the extraction process and seperated the files for for all levels. While I am trying to create unique id for merging files, I am facing some problem. For first two levels, I successfully created the unique id but for rest of the levels, I am not able to create unique id. Every time, whatever combination I am taking, it happens not to be unique. Pls help.
ReplyDelete
Replies
Your Good Friend8 December 2015 at 00:22
/*Most important code below for fixing identity of Households in all level files for all times! All level files will have these variables compulsorily within Common Items 1-35 in the same position and have the same number of bytes. You must generate it every time for every level file you want to use for maintaining common identity of the household!*/

gen hhid= fsu+ hg+ sss+ hhno

See Step-2 of this blog!
ReplyDelete
Replies
Anonymous8 December 2015 at 22:12
it is very much helpful. thanks for ur blog. It is very nice and clear
ReplyDelete
Replies
Suraj29 December 2015 at 14:31
Thank you for sharing the Google sheet with me. That was very helpful.

I am still having trouble with the multiplier files for the 71st round health expenditure dataset. Do you know why the readme file mentions that the mlt values are already included in each of the level files for the health schedule, but in the actual data, they're not there but in a separate multiplier file. Would it be adequate to get the mlt file for each household and apply the multiplier across all the levels?

Thank you very much.
-Suraj

ReplyDelete
Replies
SETH5 February 2016 at 02:27
Thank you very much for sharing the sheet. Please provide me edit access.
ReplyDelete
Replies
Anonymous12 February 2016 at 20:44
thanks. the sheet helped me to understand the weights concept.
ReplyDelete
Replies
SETH15 February 2016 at 02:47
Can you please say how to covert monthly GVA into annual in 67th round NSSO?
ReplyDelete
Replies
SETH19 February 2016 at 02:46
Thank you sir . This have helped me very much to understand the weights.
ReplyDelete
Replies
Unknown26 April 2016 at 23:51
Sir I am working on NSSO 67th round (Schedule 2.34) unit level data on unorganised enterprises. I am not able to estimate the accurate figure for gross value added per enterprise by activity category, i.e., manufacturing, trade an services. Can you please help me.
ReplyDelete
Replies
Anonymous25 June 2016 at 00:03
Sir I am working on NSSO 68th round (Schedule 10) unit level data on employment & unemployment. I am not able to extract & understand the data in SPSS. Can you please help me? Thanks in advance.
ReplyDelete
Replies
Sunil Sarathy6 July 2016 at 10:23
Could any one share the NSS 71st round health unit level data.
ReplyDelete
Replies
Guy Bhains3 August 2016 at 05:53
I can do it in Stata but not SPSS!
ReplyDelete
Replies
Unknown4 April 2017 at 00:41
Hey,
I have extracted the data using STATA at different levels. All of them have a common id. Now how do I merge all of them.
ReplyDelete
Replies
Your Good Friend12 October 2017 at 09:50
This was explained check above!
ReplyDelete
Replies
Unknown25 October 2017 at 06:38
Your blog is very informative and I did the complete extraction same as what you described. However, I think you shall explain a bit extra about creating unique household id and personal id. It's because the process you shown for making hhid was resulting in wrong outcome in my results. Using formula given here for hhid, when one calculates the hhid, we do not get the 'unique' hhids due to using the (+) operator in the formula..
However after doing a lot research I found an alternative which people can use if there files also having the problems. The problem will come no where except while merging the data using 'm:1 or 1:m" command.
In such case you may use the following commands (variable name my vary in your dataset):
----
cd "D:\Dropbox\NSSO files\68" #cd is for making a current directory
use "D:\Dropbox\NSSO files\68\68 level 3.dta"
egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
codebook unique_hh_id

use "D:\Dropbox\NSSO files\68\68 level 2.dta"
br FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1
egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
codebook unique_hh_id
save "D:\Dropbox\NSSO files\68\68_level2.dta"
merge m:1 unique_hh_id using 68_level2
save "D:\Dropbox\NSSO files\68\level_3_and_2.dta"
-----
You may change the path and variable name as per requirements
ReplyDelete
Replies
Pauline BPO Solutions7 November 2017 at 23:58
data extraction services are available
ReplyDelete
Replies
Unknown5 January 2018 at 01:10
hello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?
ReplyDelete
Replies
Unknown5 January 2018 at 01:15
hello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?
ReplyDelete
Replies
BotScraper4 January 2019 at 03:25
That's a very wonderful post. Keep writing.

data extraction services
web crawling services
web scraping services
website scraper
ReplyDelete
Replies
Unknown7 March 2019 at 05:06
how do we generate a unique ID for merging the individual and household file in NSSO 71st round?
ReplyDelete
Replies
Abu samad2 January 2020 at 01:33
Nice Article thanks for the sharing.
Facial Extraction Singapore
ReplyDelete
Replies
Rakesh17 January 2020 at 02:45

I’m heartily grateful to you for this marvelous post. And I will come back soon to get more posts.facial extraction singapore
ReplyDelete
Replies
Raja Singh11 April 2020 at 21:16
extraction of data 61 round employment and unemployment plz help me
ReplyDelete
Replies
Unknown18 April 2020 at 09:02
Sir,
I understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?
ReplyDelete
Replies
Unknown18 April 2020 at 09:04
Sir,
I understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?
Tinu Joseph
tinjos@gmail.com
9823315068
ReplyDelete
Replies
Raja Singh15 May 2020 at 20:01
This comment has been removed by the author.
ReplyDelete
Replies
Raja Singh15 May 2020 at 20:03
anyone can be help me regarding extraction of data 61 round schedule 10
ReplyDelete
Replies
Anonymous6 July 2020 at 03:58
satpura agro mart
Great ! nice info We like your post.
ReplyDelete
Replies
Raja Singh30 July 2020 at 09:55
how i can calculate the broad employment (ps+ss) 68 round schedule 10
ReplyDelete
Replies
Selva24 June 2021 at 04:23
Thanks for sharing this wonderful information. I too learn something new from your post.
Informatica Training in Chennai
Informatica Training in Bangalore
ReplyDelete
Replies
sam kirubakar15 February 2022 at 05:07
I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
Data Scraping Service in India 2022
ReplyDelete
Replies
sam kirubakar7 March 2022 at 04:54
Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
Continue to share your knowledge through articles like these, and keep posting more blogs.
And more Information Data Scraping Service in USA
ReplyDelete
Replies
sam kirubakar15 March 2022 at 06:26

Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
Continue to share your knowledge through articles like these, and keep posting more blogs.
And more Information Data scraping service in Australia
ReplyDelete
Replies
James Zicrov14 October 2024 at 08:08
This comment has been removed by the author.
ReplyDelete
Replies
James Zicrov14 October 2024 at 08:09
This guide to analyzing NSSO data is informative! The step-by-step approach helps in understanding sampling methods and how data is collected through Data collection tool effectively. Thanks for sharing!
ReplyDelete
Replies

Analysing NSSO data

Saturday, 28 November 2015

Your complete guide to analysing National Sample Survey (NSSO) data

46 comments:

Blog Archive

About Me