Saturday, 28 November 2015

Your complete guide to analysing National Sample Survey (NSSO) data

NSSO DATA ANALYSIS
Welcome!

Without wasting any of your precious time, we will take you through the process in steps, utilising India's 68th round NSSO as an example. All your queries will be answered within 3 days!

STEP-1 : Understanding the sampling methodology and calculation of sampling weights

Here, we will illustrate the methodology for sub samplewise estimates in rural sector for the consumer expenditure survey of the 68th round of NSSO (2012-13). Let's say we want to see the data for Meghalaya as in Fig 1.

Meghalaya has several districts as can be seen in the map below in Fig-2.


NSSO  has a concept of  NSS State-Region:  An NSS state-region is a contiguous group of districts within a State having similar topography, agro-economic characteristics and population densities. For bigger States, the number of regions goes up to 7 while for smaller States/UTs, there is only one region. The regions have some distinctive geographical features and climatic conditions and this makes the regional estimates more meaningful and useful in some respects. The boundaries of a region generally do not cut across district boundaries or State boundaries.A region may comprise a single district for small States/UTs but for larger States, there may be 10-12 districts in a region.
In our example above, the entire State of Meghalaya has been assigned as only one region viz. NSS-SR= 171. Check out Fig-5 below,


Central Sample & State Samples: In most NSS rounds, survey for the State sample is done by the respective State Govts, while for the Central sample it is done by NSSO.  State samples & Central samples are drawn separately or allocated separately.

For rural India, 59,695 households (SSU, explained later) were selected from 7, 469 villages (PSU,explained later) selected from the country were sample surveyed under the Central sample. Subsequently we will discuss only about central samples.

Schedule Type 1 & Type 2 : Type 1 questionnaires  based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
Schedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were  surveyed separately  from the same 7,469 villages under Central sample. We will discuss about Type-1 samples only subsequently.

Stratum: Districts within the NSS-SR are taken as the primary strata. However this is also broken into 2 sectors viz. Urban Sector & Rural Sector. We  will discuss only about the rural sector in this example. In our example, we have 7 strata under NSS-SR rural sector , each of them is a distinct district with their geographical boundary.
Sub-Stratum: If the number of  households of a district is large, then it is sub-divided into two or more sub-strata of nearly equal households by grouping together contiguous groups of villages having similar socio-economic characteristics.
In our example, the villages has been grouped in West Garo Hills district (Stratum=1, Distcd=1)  into 6 sub-strata, in such a  way that there are more  or less equal number of households in each such group. Check out Fig-5 table above and Fig-3 below as an conceptual illustration,



FSU(First Stage Units): These are  villages in the sub-stratum for the rural sector. Here, the Sampling Frame is the  list of villages under Census 2011.
USU/SSU(Ultimate/Second Stage Units): These are households available for survey within the village Here, Sampling Frame is all the households available for survey not all the households in the village (please note the distinction, it is important!)
Sub-Rounds: The period of survey is one year duration starting 1st July and ending 30 June (India's Agricultural Year). This survey period is  broken into 4 subrounds of 3 months duration as below,
Sub-round 1 : July-Sept
Sub-round 2 : Oct-Dec
Sub-round 3 : Jan-Mar
Sub-round 4 : Apr-Jun

Concept of Interpenetrating Sub-samples:  The method consists in drawing samples of FSUs/Villages in the form of 2 villages @8HH per  village,thus 16 HHs as ONE subsample and ANOTHER subsample drawn independently in a similar manner . Thus, we have  2 SUBSAMPLES IN EACH SUBSTRATUM. Checkout Fig-6 table below.
The main advantage is that the Relative Standard Error of Estimates (RSE) can be easily found out in this method, even when the sample design is complex.

Second Stage Stratum(SSS): Each household available for survey in the selected FSU/Village is categorised IN EITHER OF THE THREE categories, as the following,illustrated conceptually in Fig-4 below,

SSS 1 Relatively affluent
SSS 2 Of the remaining, having principal earning from  non-agricultural sources
SSS 3 All other remaining


Generally 2 households from SSS1 , 4 households from SSS2 & 2 households from SSS3 are selected from the household available for survey sample  frame  IN THE FSU/VILLAGE as  explained above(If some categories are not found  then it is made up as per priority rules ensuring ALWAYS that in total  8 households are selected from each FSU/Village)

Fig-7 illustrates the case for only 4 villages for your understanding,



Now we can show you how the sampling weights are calculated !



The meaning of the notations used in NSSO are as below,




Go back to Fig-6 

Can you try these now? Find out the relevant formula and check whether it matches the
calculated weight as given on left! If you can then you  have definitely  understood the 
sampling plan and  the concept of multipliers/weights in NSSO Unit Level data.

For the first row it is worked out for you as below for calculation of the multiplier for this household, See the above complete working in an Excel file here!  => ($V$2/4)*(1/$BG$15)*(BG29/COUNTIF(AR2:AR9,1))


Congratulations!

Thank you I hope you enjoyed understanding the apparently complex stuff  as much as I enjoyed in preparing its explanation!

Next, we will look at how to decipher and analyse NSSO data for a given problem.

So for now, good-bye and good luck!.



46 comments:

  1. LOOKING FOWARD,,,

    ReplyDelete
  2. Earlier I had tried to understand the sampling process of NSSO step by step, but it was quite laborious and painstaking. Your blog made several things much clearer.

    ReplyDelete
  3. I am working with NSSO 51_2.2 round data set. I have the extraction process and seperated the files for for all levels. While I am trying to create unique id for merging files, I am facing some problem. For first two levels, I successfully created the unique id but for rest of the levels, I am not able to create unique id. Every time, whatever combination I am taking, it happens not to be unique. Pls help.

    ReplyDelete
    Replies
    1. https://nssodataanalysis.wordpress.com
      https://nssodataanalysis.blogspot.com

      Visit above blogs to get the help on NSSO data.

      Delete
  4. /*Most important code below for fixing identity of Households in all level files for all times! All level files will have these variables compulsorily within Common Items 1-35 in the same position and have the same number of bytes. You must generate it every time for every level file you want to use for maintaining common identity of the household!*/

    gen hhid= fsu+ hg+ sss+ hhno

    See Step-2 of this blog!

    ReplyDelete
  5. it is very much helpful. thanks for ur blog. It is very nice and clear

    ReplyDelete
    Replies
    1. Thank you! This is a complex topic I will be happy to serve other scholars like you! Please inform them!

      Delete
    2. Hello Sir!

      Your blog is indeed very useful and informative. I would like to ask your help in my specific problem.
      I am working on NSS-68th Round-Consumption expenditure (Type II).

      STEP I: I have extracted the data of Level 2, Level 5 to Level 9 and thereafter, I created a common household id using a command: egen hhid = concat ( FSUno HamletGp SecStageStratum SampleHHno )

      STEP II: Thereafter, I merged Level 2 with 5, 6, 7, 8 and 9 as I required data via household type, for which the info is given in Level 2.

      STEP III: Now, after having the required data i.e. total consumption and expenditure for 6 household type in rural areas and 4 household type in Urban area for every product (Item Code wise), I am having trouble in understanding the numbers?? That is whether the numbers give monthly/weekly/annual data for these households type?? For my analysis I require the total annual consumption for all these household type, product wise.

      PLEASE help me out!!

      Thanks

      Delete
    3. Type 1 questionnaires based upon consumption during last 30 days & last 365 days, while Type 2 questionnaries based upon last 365 days (infrequently purchased categories), last 7 days (some food categories) & last 30 days for (other food, fuel, etc)
      Schedule Type 1 & Schedule Type 2 were canvassed in 2 independent samples of matching size drawn from each stratum/sub-stratum(same FSUs/Villages).Thus, under Schedule Type 1- 59,695 households were surveyed while under Schedule Type 2 - 59,683 households were surveyed separately from the same 7,469 villages under Central sampl

      Delete
  6. Thank you for sharing the Google sheet with me. That was very helpful.

    I am still having trouble with the multiplier files for the 71st round health expenditure dataset. Do you know why the readme file mentions that the mlt values are already included in each of the level files for the health schedule, but in the actual data, they're not there but in a separate multiplier file. Would it be adequate to get the mlt file for each household and apply the multiplier across all the levels?

    Thank you very much.
    -Suraj

    ReplyDelete
    Replies
    1. https://nssodataanalysis.wordpress.com
      https://nssodataanalysis.blogspot.com

      Visit above blogs to get the help on NSSO data.

      Delete
  7. Thank you very much for sharing the sheet. Please provide me edit access.

    ReplyDelete
  8. thanks. the sheet helped me to understand the weights concept.

    ReplyDelete
  9. Can you please say how to covert monthly GVA into annual in 67th round NSSO?

    ReplyDelete
  10. Thank you sir . This have helped me very much to understand the weights.

    ReplyDelete
  11. Sir I am working on NSSO 67th round (Schedule 2.34) unit level data on unorganised enterprises. I am not able to estimate the accurate figure for gross value added per enterprise by activity category, i.e., manufacturing, trade an services. Can you please help me.

    ReplyDelete
    Replies
    1. Are you using Stata?
      Select f(14.0g) format and not f(14.1g) format. The idea is to allow fraction and not to round off. If it does not help contact me at ujjwalseth11@yahoo.com

      Delete
  12. Sir I am working on NSSO 68th round (Schedule 10) unit level data on employment & unemployment. I am not able to extract & understand the data in SPSS. Can you please help me? Thanks in advance.

    ReplyDelete
    Replies
    1. https://nssodataanalysis.wordpress.com
      https://nssodataanalysis.blogspot.com

      Visit above blogs to get the help on NSSO data.

      Delete
  13. Could any one share the NSS 71st round health unit level data.

    ReplyDelete
  14. I can do it in Stata but not SPSS!

    ReplyDelete
  15. Hey,
    I have extracted the data using STATA at different levels. All of them have a common id. Now how do I merge all of them.

    ReplyDelete
    Replies
    1. NSSO Unit level Data Analysis (https://nssounitlevel.wordpress.com/)
      Visit above website for more update on NSSO Data, analysis and STATA Software

      Delete
  16. Your blog is very informative and I did the complete extraction same as what you described. However, I think you shall explain a bit extra about creating unique household id and personal id. It's because the process you shown for making hhid was resulting in wrong outcome in my results. Using formula given here for hhid, when one calculates the hhid, we do not get the 'unique' hhids due to using the (+) operator in the formula..
    However after doing a lot research I found an alternative which people can use if there files also having the problems. The problem will come no where except while merging the data using 'm:1 or 1:m" command.
    In such case you may use the following commands (variable name my vary in your dataset):
    ----
    cd "D:\Dropbox\NSSO files\68" #cd is for making a current directory
    use "D:\Dropbox\NSSO files\68\68 level 3.dta"
    egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
    codebook unique_hh_id


    use "D:\Dropbox\NSSO files\68\68 level 2.dta"
    br FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1
    egen unique_hh_id= concat(FSU_Serial_No_1 Hamlet_group_1 Second_stage_1 Sample_hhld__No_1)
    codebook unique_hh_id
    save "D:\Dropbox\NSSO files\68\68_level2.dta"
    merge m:1 unique_hh_id using 68_level2
    save "D:\Dropbox\NSSO files\68\level_3_and_2.dta"
    -----
    You may change the path and variable name as per requirements

    ReplyDelete
  17. hello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?

    ReplyDelete
  18. hello sir, i am working on 70th round data of NSSO of SPSS. Nw i want to merge two data sets of 70th round say level 2 and level 14 in spss. but i could not. while comuting indebted household i encounter the problem of duplication . Can you please help me?

    ReplyDelete
  19. how do we generate a unique ID for merging the individual and household file in NSSO 71st round?

    ReplyDelete

  20. I’m heartily grateful to you for this marvelous post. And I will come back soon to get more posts.facial extraction singapore

    ReplyDelete
  21. extraction of data 61 round employment and unemployment plz help me

    ReplyDelete
  22. Sir,
    I understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?

    ReplyDelete
  23. Sir,
    I understood that the combined weight should be used during analysis of aggregate data. But if I want to analyse each state separately, which weight I have to use...could you please tell me where we have to use 'Final weight for sub-sample wise estimates?...Please clarify the usage of 'Final weight for sub-sample wise estimates' and Final weight for sub-sample combined estimates'?
    Tinu Joseph
    tinjos@gmail.com
    9823315068

    ReplyDelete
    Replies
    1. sub- sample are created for cross checking, to ensure that we get valid estimates. It has nothing to do with State- wise estimation.

      Delete
  24. This comment has been removed by the author.

    ReplyDelete
  25. anyone can be help me regarding extraction of data 61 round schedule 10

    ReplyDelete
  26. satpura agro mart
    Great ! nice info We like your post.

    ReplyDelete
  27. how i can calculate the broad employment (ps+ss) 68 round schedule 10

    ReplyDelete
  28. Thanks for sharing this wonderful information. I too learn something new from your post.
    Informatica Training in Chennai
    Informatica Training in Bangalore

    ReplyDelete
  29. I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
    Data Scraping Service in India 2022

    ReplyDelete
  30. Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
    Continue to share your knowledge through articles like these, and keep posting more blogs.
    And more Information Data Scraping Service in USA

    ReplyDelete

  31. Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing.
    Continue to share your knowledge through articles like these, and keep posting more blogs.
    And more Information Data scraping service in Australia

    ReplyDelete
  32. This comment has been removed by the author.

    ReplyDelete
  33. This guide to analyzing NSSO data is informative! The step-by-step approach helps in understanding sampling methods and how data is collected through Data collection tool effectively. Thanks for sharing!

    ReplyDelete