Welcome!
Without wasting any of your precious time, we will take you through the process in steps, utilising India's 68th round NSSO as an example. All your queries will be answered within 3 days!
STEP-1 : Understanding the sampling methodology and calculation of sampling weights
Here, we will illustrate the methodology for sub samplewise estimates in rural sector for the consumer expenditure survey of the 68th round of NSSO (2012-13). Let's say we want to see the data for Meghalaya as in Fig 1.
Meghalaya has several districts as can be seen in the map below in Fig-2.
NSSO has a concept of NSS State-Region: An NSS state-region is a contiguous group of districts within a State having similar topography, agro-economic characteristics and population densities. For bigger States, the number of regions goes up to 7 while for smaller States/UTs, there is only one region. The regions have some distinctive geographical features and climatic conditions and this makes the regional estimates more meaningful and useful in some respects. The boundaries of a region generally do not cut across district boundaries or State boundaries.A region may comprise a single district for small States/UTs but for larger States, there may be 10-12 districts in a region.
In our example above, the entire State of Meghalaya has been assigned as only one region viz. NSS-SR= 171. Check out Fig-5 below,
Central Sample & State Samples: In most NSS rounds, survey for the State sample is done by the respective State Govts, while for the Central sample it is done by NSSO. State samples & Central samples are drawn separately or allocated separately.
For rural India, 59,695 households (SSU, explained later) were selected from 7, 469 villages (PSU,explained later) selected from the country were sample surveyed under the Central sample. Subsequently we will discuss only about central samples.
Schedule Type 1 & Type 2 : Type 1 questionnaires based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
Schedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were surveyed separately from the same 7,469 villages under Central sample. We will discuss about Type-1 samples only subsequently.
In our example, the villages has been grouped in West Garo Hills district (Stratum=1, Distcd=1) into 6 sub-strata, in such a way that there are more or less equal number of households in each such group. Check out Fig-5 table above and Fig-3 below as an conceptual illustration,
FSU(First Stage Units): These are villages in the sub-stratum for the rural sector. Here, the Sampling Frame is the list of villages under Census 2011.
USU/SSU(Ultimate/Second Stage Units): These are households available for survey within the village Here, Sampling Frame is all the households available for survey not all the households in the village (please note the distinction, it is important!)
Concept of Interpenetrating Sub-samples: The method consists in drawing samples of FSUs/Villages in the form of 2 villages @8HH per village,thus 16 HHs as ONE subsample and ANOTHER subsample drawn independently in a similar manner . Thus, we have 2 SUBSAMPLES IN EACH SUBSTRATUM. Checkout Fig-6 table below.
The main advantage is that the Relative Standard Error of Estimates (RSE) can be easily found out in this method, even when the sample design is complex.
Second Stage Stratum(SSS): Each household available for survey in the selected FSU/Village is categorised IN EITHER OF THE THREE categories, as the following,illustrated conceptually in Fig-4 below,
SSS 1 Relatively affluent
SSS 2 Of the remaining, having principal earning from non-agricultural sources
SSS 3 All other remaining
Generally 2 households from SSS1 , 4 households from SSS2 & 2 households from SSS3 are selected from the household available for survey sample frame IN THE FSU/VILLAGE as explained above(If some categories are not found then it is made up as per priority rules ensuring ALWAYS that in total 8 households are selected from each FSU/Village)
Fig-7 illustrates the case for only 4 villages for your understanding,
Now we can show you how the sampling weights are calculated !
The meaning of the notations used in NSSO are as below,
Go back to Fig-6
Can you try these now? Find out the relevant formula and check whether it matches the
calculated weight as given on left! If you can then you have definitely understood the
sampling plan and the concept of multipliers/weights in NSSO Unit Level data.
For the first row it is worked out for you as below for calculation of the multiplier for this household, See the above complete working in an Excel file here! => ($V$2/4)*(1/$BG$15)*(BG29/COUNTIF(AR2:AR9,1))
Congratulations!
Thank you I hope you enjoyed understanding the apparently complex stuff as much as I enjoyed in preparing its explanation!
Next, we will look at how to decipher and analyse NSSO data for a given problem.
So for now, good-bye and good luck!.
Without wasting any of your precious time, we will take you through the process in steps, utilising India's 68th round NSSO as an example. All your queries will be answered within 3 days!
STEP-1 : Understanding the sampling methodology and calculation of sampling weights
Here, we will illustrate the methodology for sub samplewise estimates in rural sector for the consumer expenditure survey of the 68th round of NSSO (2012-13). Let's say we want to see the data for Meghalaya as in Fig 1.
NSSO has a concept of NSS State-Region: An NSS state-region is a contiguous group of districts within a State having similar topography, agro-economic characteristics and population densities. For bigger States, the number of regions goes up to 7 while for smaller States/UTs, there is only one region. The regions have some distinctive geographical features and climatic conditions and this makes the regional estimates more meaningful and useful in some respects. The boundaries of a region generally do not cut across district boundaries or State boundaries.A region may comprise a single district for small States/UTs but for larger States, there may be 10-12 districts in a region.
In our example above, the entire State of Meghalaya has been assigned as only one region viz. NSS-SR= 171. Check out Fig-5 below,
Central Sample & State Samples: In most NSS rounds, survey for the State sample is done by the respective State Govts, while for the Central sample it is done by NSSO. State samples & Central samples are drawn separately or allocated separately.
For rural India, 59,695 households (SSU, explained later) were selected from 7, 469 villages (PSU,explained later) selected from the country were sample surveyed under the Central sample. Subsequently we will discuss only about central samples.
Schedule Type 1 & Type 2 : Type 1 questionnaires based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
Schedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were surveyed separately from the same 7,469 villages under Central sample. We will discuss about Type-1 samples only subsequently.
Stratum: Districts within the NSS-SR are taken as the primary strata. However this is also broken into 2 sectors viz. Urban Sector & Rural Sector. We will discuss only about the rural sector in this example. In our example, we have 7 strata under NSS-SR rural sector , each of them is a distinct district with their geographical boundary.
Sub-Stratum: If the number of households of a district is large, then it is sub-divided into two or more sub-strata of nearly equal households by grouping together contiguous groups of villages having similar socio-economic characteristics. In our example, the villages has been grouped in West Garo Hills district (Stratum=1, Distcd=1) into 6 sub-strata, in such a way that there are more or less equal number of households in each such group. Check out Fig-5 table above and Fig-3 below as an conceptual illustration,
FSU(First Stage Units): These are villages in the sub-stratum for the rural sector. Here, the Sampling Frame is the list of villages under Census 2011.
USU/SSU(Ultimate/Second Stage Units): These are households available for survey within the village Here, Sampling Frame is all the households available for survey not all the households in the village (please note the distinction, it is important!)
Sub-Rounds: The period of survey is one year duration starting 1st July and ending 30 June (India's Agricultural Year). This survey period is broken into 4 subrounds of 3 months duration as below,
Sub-round 1 : July-Sept
Sub-round 2 : Oct-Dec
Sub-round 3 : Jan-Mar
Sub-round 4 : Apr-Jun
Concept of Interpenetrating Sub-samples: The method consists in drawing samples of FSUs/Villages in the form of 2 villages @8HH per village,thus 16 HHs as ONE subsample and ANOTHER subsample drawn independently in a similar manner . Thus, we have 2 SUBSAMPLES IN EACH SUBSTRATUM. Checkout Fig-6 table below.
The main advantage is that the Relative Standard Error of Estimates (RSE) can be easily found out in this method, even when the sample design is complex.
Second Stage Stratum(SSS): Each household available for survey in the selected FSU/Village is categorised IN EITHER OF THE THREE categories, as the following,illustrated conceptually in Fig-4 below,
SSS 1 Relatively affluent
SSS 2 Of the remaining, having principal earning from non-agricultural sources
SSS 3 All other remaining
Generally 2 households from SSS1 , 4 households from SSS2 & 2 households from SSS3 are selected from the household available for survey sample frame IN THE FSU/VILLAGE as explained above(If some categories are not found then it is made up as per priority rules ensuring ALWAYS that in total 8 households are selected from each FSU/Village)
Fig-7 illustrates the case for only 4 villages for your understanding,
Now we can show you how the sampling weights are calculated !
The meaning of the notations used in NSSO are as below,
Go back to Fig-6
Can you try these now? Find out the relevant formula and check whether it matches the
calculated weight as given on left! If you can then you have definitely understood the
sampling plan and the concept of multipliers/weights in NSSO Unit Level data.
For the first row it is worked out for you as below for calculation of the multiplier for this household, See the above complete working in an Excel file here! => ($V$2/4)*(1/$BG$15)*(BG29/COUNTIF(AR2:AR9,1))
Congratulations!
Thank you I hope you enjoyed understanding the apparently complex stuff as much as I enjoyed in preparing its explanation!
Next, we will look at how to decipher and analyse NSSO data for a given problem.
So for now, good-bye and good luck!.
LOOKING FOWARD,,,
ReplyDeleteEarlier I had tried to understand the sampling process of NSSO step by step, but it was quite laborious and painstaking. Your blog made several things much clearer.
ReplyDeleteI am working with NSSO 51_2.2 round data set. I have the extraction process and seperated the files for for all levels. While I am trying to create unique id for merging files, I am facing some problem. For first two levels, I successfully created the unique id but for rest of the levels, I am not able to create unique id. Every time, whatever combination I am taking, it happens not to be unique. Pls help.
ReplyDeletehttps://nssodataanalysis.wordpress.com
Deletehttps://nssodataanalysis.blogspot.com
Visit above blogs to get the help on NSSO data.
/*Most important code below for fixing identity of Households in all level files for all times! All level files will have these variables compulsorily within Common Items 1-35 in the same position and have the same number of bytes. You must generate it every time for every level file you want to use for maintaining common identity of the household!*/
ReplyDeletegen hhid= fsu+ hg+ sss+ hhno
See Step-2 of this blog!
it is very much helpful. thanks for ur blog. It is very nice and clear
ReplyDeleteThank you! This is a complex topic I will be happy to serve other scholars like you! Please inform them!
DeleteHello Sir!
DeleteYour blog is indeed very useful and informative. I would like to ask your help in my specific problem.
I am working on NSS-68th Round-Consumption expenditure (Type II).
STEP I: I have extracted the data of Level 2, Level 5 to Level 9 and thereafter, I created a common household id using a command: egen hhid = concat ( FSUno HamletGp SecStageStratum SampleHHno )
STEP II: Thereafter, I merged Level 2 with 5, 6, 7, 8 and 9 as I required data via household type, for which the info is given in Level 2.
STEP III: Now, after having the required data i.e. total consumption and expenditure for 6 household type in rural areas and 4 household type in Urban area for every product (Item Code wise), I am having trouble in understanding the numbers?? That is whether the numbers give monthly/weekly/annual data for these households type?? For my analysis I require the total annual consumption for all these household type, product wise.
PLEASE help me out!!
Thanks
Type 1 questionnaires based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
DeleteSchedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were surveyed separately from the same 7,469 villages under Central sampl
Thank you for sharing the Google sheet with me. That was very helpful.
ReplyDeleteI am still having trouble with the multiplier files for the 71st round health expenditure dataset. Do you know why the readme file mentions that the mlt values are already included in each of the level files for the health schedule, but in the actual data, they're not there but in a separate multiplier file. Would it be adequate to get the mlt file for each household and apply the multiplier across all the levels?
Thank you very much.
-Suraj
https://nssodataanalysis.wordpress.com
Deletehttps://nssodataanalysis.blogspot.com
Visit above blogs to get the help on NSSO data.
Thank you very much for sharing the sheet. Please provide me edit access.
ReplyDeletethanks. the sheet helped me to understand the weights concept.
ReplyDeleteCan you please say how to covert monthly GVA into annual in 67th round NSSO?
ReplyDeleteThank you sir . This have helped me very much to understand the weights.
ReplyDeleteSir I am working on NSSO 67th round (Schedule 2.34) unit level data on unorganised enterprises. I am not able to estimate the accurate figure for gross value added per enterprise by activity category, i.e., manufacturing, trade an services. Can you please help me.
ReplyDeleteAre you using Stata?
DeleteSelect f(14.0g) format and not f(14.1g) format. The idea is to allow fraction and not to round off. If it does not help contact me at ujjwalseth11@yahoo.com
Sir I am working on NSSO 68th round (Schedule 10) unit level data on employment & unemployment. I am not able to extract & understand the data in SPSS. Can you please help me? Thanks in advance.
ReplyDeletehttps://nssodataanalysis.wordpress.com
Deletehttps://nssodataanalysis.blogspot.com
Visit above blogs to get the help on NSSO data.
Could any one share the NSS 71st round health unit level data.
ReplyDeleteI can do it in Stata but not SPSS!
ReplyDeleteHey,
ReplyDeleteI have extracted the data using STATA at different levels. All of them have a common id. Now how do I merge all of them.
NSSO Unit level Data Analysis (https://nssounitlevel.wordpress.com/)
DeleteVisit above website for more update on NSSO Data, analysis and STATA Software
This was explained check above!
ReplyDeleteYour blog is very informative and I did the complete extraction same as what you described. However, I think you shall explain a bit extra about creating unique household id and personal id. It's because the process you shown for making hhid was resulting in wrong outcome in my results. Using formula given here for hhid, when one calculates the hhid, we do not get the 'unique' hhids due to using the (+) operator in the formula..
ReplyDeleteHowever after doing a lot research I found an alternative which people can use if there files also having the problems. The problem will come no where except while merging the data using 'm:1 or 1:m" command.
In such case you may use the following commands (variable name my vary in your dataset):
----
cd "D:\Dropbox\NSSO files\68" #cd is for making a current directory
use "D:\Dropbox\NSSO files\68\68 level 3.dta"
egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
codebook unique_hh_id
use "D:\Dropbox\NSSO files\68\68 level 2.dta"
br FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1
egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
codebook unique_hh_id
save "D:\Dropbox\NSSO files\68\68_level2.dta"
merge m:1 unique_hh_id using 68_level2
save "D:\Dropbox\NSSO files\68\level_3_and_2.dta"
-----
You may change the path and variable name as per requirements
data extraction services are available
ReplyDeletehello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?
ReplyDeletehello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?
ReplyDeleteThat's a very wonderful post. Keep writing.
ReplyDeletedata extraction services
web crawling services
web scraping services
website scraper
how do we generate a unique ID for merging the individual and household file in NSSO 71st round?
ReplyDeleteNice Article thanks for the sharing.
ReplyDeleteFacial Extraction Singapore
ReplyDeleteI’m heartily grateful to you for this marvelous post. And I will come back soon to get more posts.facial extraction singapore
extraction of data 61 round employment and unemployment plz help me
ReplyDeleteSir,
ReplyDeleteI understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?
Sir,
ReplyDeleteI understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?
Tinu Joseph
tinjos@gmail.com
9823315068
sub- sample are created for cross checking, to ensure that we get valid estimates. It has nothing to do with State- wise estimation.
DeleteThis comment has been removed by the author.
ReplyDeleteanyone can be help me regarding extraction of data 61 round schedule 10
ReplyDeletesatpura agro mart
ReplyDeleteGreat ! nice info We like your post.
how i can calculate the broad employment (ps+ss) 68 round schedule 10
ReplyDeleteThanks for sharing this wonderful information. I too learn something new from your post.
ReplyDeleteInformatica Training in Chennai
Informatica Training in Bangalore
I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
ReplyDeleteData Scraping Service in India 2022
Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
ReplyDeleteContinue to share your knowledge through articles like these, and keep posting more blogs.
And more Information Data Scraping Service in USA
ReplyDeleteVery Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
Continue to share your knowledge through articles like these, and keep posting more blogs.
And more Information Data scraping service in Australia
This comment has been removed by the author.
ReplyDeleteThis guide to analyzing NSSO data is informative! The step-by-step approach helps in understanding sampling methods and how data is collected through Data collection tool effectively. Thanks for sharing!
ReplyDelete