Analyzing College Data About First-Generation Students in the United States¶

Introduction¶

I decided to look at the College Scorecard data, but focus on the columns that have information about first-generation students in college. As a first-generation student of color, I have noticed many other first-generation students take time off and sometimes not return to Bowdoin. This observation inspired me to look at the completion rates of first-generation students at various types of institutions. I chose to look at Colleges and Universities that only give out Bachelor's degrees. I look at four year institutions to narrow down the data and get rid of vocational type of institutions and institutions that give out Associate's degrees. Given that many students do not graduate within four years, I look at how likely first-generation students are to graduate within six years.

Information About The Topic¶

First-generation students have to overcome many obstacles when transitioning into college. Since many first-generation students tend to come from a disadvantaged background, it is often difficult to complete college within four years. For low-income first-generation students, moving into college can be quite the cultural shock. The demands from home and the financial constraints are often difficult to balance in college. These added stresses can lead first-generation students to feel distressed, feel like they do not belong, and encourage them to give up on school all together. The majority of students' stress comes from financial constraints so some studies have predicted that making college more affordable could help increase the retention and completion rates among first-generation students. Despite this suggestion, first-generation students continue to face disadvantages that prevent them from completing college.

Research Question¶

What states have the lowest rates of first-generation students that graduate with a bachelor's degree within six years?

Hypothesis¶

I think it is logical to assume that bigger states such as Texas and California will have more first-generation students because they have a bigger pool of college-age students to look at. Having a bigger pool of students means that it is more difficult to have higher completion rates when compared to other states. My prediction is that big states like Texas, California, along with the border states of New Mexico and Arizona, will have higher rates of first-generation students graduate with a Bachelors degree within six years. On a similar note, I feel that smaller states such as North Dakota, The District of Columbia, Vermont, and Connecticut will have lower completion rates because they have a smaller population.

Part One: Code and Subsetting the Data¶

\1. The code below is loading the different packages that I will be using in my notebook. This is especially important for my visuals and merging the College Scorecard data with the States data.

library(ggplot2)
library(maps)
library(RColorBrewer)
library(ggplot2)
library(rgdal)
library(sp)
library(rgeos)
library(maptools)

Warning message:
"package 'maps' was built under R version 3.3.3"Warning message:
"package 'rgdal' was built under R version 3.3.3"Loading required package: sp
Warning message:
"package 'sp' was built under R version 3.3.3"rgdal: version: 1.2-6, (SVN revision 651)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.0.1, released 2015/09/15
 Path to GDAL shared files: C:/Users/Karla/Documents/R/win-library/3.3/rgdal/gdal
 Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
 Path to PROJ.4 shared files: C:/Users/Karla/Documents/R/win-library/3.3/rgdal/proj
 Linking to sp version: 1.2-4 
Warning message:
"package 'rgeos' was built under R version 3.3.3"rgeos version: 0.3-23, (SVN revision 546)
 GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084 
 Linking to sp version: 1.2-4 
 Polygon checking: TRUE 

Warning message:
"package 'maptools' was built under R version 3.3.3"Checking rgeos availability: TRUE

\2. The code below creates a vector called states that uses the maps data and then shows us a table of the first six rows in the map data.

states <- map_data("state")
head(states)

\3. I created a logical vector called csc that is loading a new excel spreadsheet I created that contains the following column variables:

INSTNM = Institution Name

Region = Abbreviated State Name

CONTROL = 1 for a Public School, 2 for a Private nonprofit, 3 for a Private for-profit School

LATITUDE

LONGITUDE

UGDS_HISP = Total Share of Enrollment of Undergraduate Degree-Seeking Students who are Hispanic

FIRSTGEN_COMP_ORIG_YR6_RT = Percent of First-Generation Students who Completed Within 6 Years at Original Institution

FIRST_GEN = Share/ Percentage of First-Generation Students

HIGHDEG = 1 for a Certificate Degree, 2 for an Associates Degree, 3 for Bachelors Degree, and 4 for a Graduates Degree

REGION2 = 1 for New England (CT, ME, MA, NH, RI, VT), 2 Mid East (DE, DC, MD, NJ, NY, PA), 3 Great Lakes (IL, IN, MI, OH, WI), 4 Plains (IA, KS, MN, MO, NE, ND, SD), 5 Southeast (AL, AR, FL, GA, KY, LA, MS, NC, SC, TN, VA, WV), 6 Southwest (AZ, NM, OK, TX), 7 Rocky Mountains (CO, ID, MT, UT, WY), 8 Far West (AK, CA, HI, NV, OR, WA), 9 Outlying Areas (AS, FM, GU, MH, MP, PR, PW, VI)

csc <- read.csv("College_Data_FirstGen.csv", header = TRUE, stringsAsFactors = FALSE)

\4. The code below turns the abbreviated state names in the "region" column into the lowercase state names so that it can match with the "region" column in the map data.

#'x' is the column of a data.frame that holds 2 digit state codes
stateFromLower <-function(x) {
   #read 52 state codes into local variable [includes DC (Washington D.C. and PR (Puerto Rico)]
  st.codes<-data.frame(
                      state=as.factor(c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
                                         "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME",
                                         "MI", "MN", "MO", "MS",  "MT", "NC", "ND", "NE", "NH", "NJ", "NM",
                                         "NV", "NY", "OH", "OK", "OR", "PA", "PR", "RI", "SC", "SD", "TN",
                                         "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY")),
                      full=as.factor(c("alaska","alabama","arkansas","arizona","california","colorado",
                                       "connecticut","district of columbia","delaware","florida","georgia",
                                       "hawaii","iowa","idaho","illinois","indiana","kansas","kentucky",
                                       "louisiana","massachusetts","maryland","maine","michigan","minnesota",
                                       "missouri","mississippi","montana","north carolina","north dakota",
                                       "nebraska","new hampshire","new jersey","new mexico","nevada",
                                       "new york","ohio","oklahoma","oregon","pennsylvania","puerto rico",
                                       "rhode island","south carolina","south dakota","tennessee","texas",
                                       "utah","virginia","vermont","washington","wisconsin",
                                       "west virginia","wyoming"))
                       )
     #create an nx1 data.frame of state codes from source column
  st.x<-data.frame(state=x)
     #match source codes with codes from 'st.codes' local variable and use to return the full state name
  refac.x<-st.codes$full[match(st.x$state,st.codes$state)]
     #return the full state names in the same order in which they appeared in the original source
  return(refac.x)
 
}

\5. I created a new column in the csc data called "region" that uses the lowecase names of the states. I then print out the first ten state names for the region column in the csc data.

csc$region <- stateFromLower(csc$STABBR)
csc$region[1:10]

\6. I created a new vector below called csc_df that merges the csc and states data so that their region column is the same. I then print out the first six rows in a table of the new csc_df vector.

csc_df <- merge(csc, states, by = "region")
head(csc_df)

\7. The code below creates a new vector called csc2 that subsets the csc data by only including colleges that only give out Bachelors degrees. The head function prints out the first six rows of the subset of csc.

csc2 <- csc[csc$HIGHDEG == 3,]
head(csc2)

\8. Here I created a tx vector that only looks at colleges from csc2 that are in Texas. The s vector subsets the tx data by only looking at the columns listed below. The first six rows are show in the table below.

tx <- csc2$region == "texas"
tx2 <- csc2[csc2$CONTROL == 2,]
s <- csc2[tx,c("UGDS_HISP", "FIRST_GEN", "FIRSTGEN_COMP_ORIG_YR6_RT", "INSTNM", "CONTROL")]

head(s)

\9. I created a vector called complete that gets rid of the NAs and the non numeric values in the UGDS_HISP, FIRST_GEN, and the FIRSTGEN_COMP_IRIG_YR6_RT columns. I edit the s vector by using the vector called complete and then print the first six columns to check if I got rid of the nonnumeric values in the data.

complete <- complete.cases(cbind(as.numeric(s[,1]),as.numeric(s[,2]), as.numeric(s[,3], as.numeric(s[,4]))))
complete[1:5]

s <- s[complete, c("UGDS_HISP", "FIRST_GEN", "FIRSTGEN_COMP_ORIG_YR6_RT", "INSTNM", "CONTROL")]
head(s)

Warning message in cbind(as.numeric(s[, 1]), as.numeric(s[, 2]), as.numeric(s[, :
"NAs introduced by coercion"Warning message in cbind(as.numeric(s[, 1]), as.numeric(s[, 2]), as.numeric(s[, :
"NAs introduced by coercion"

\10. Here I created a vector called cexValsthat repeat the size of the plotted values for every row in the csc2 data and I subset to look at schools in texas. The pchVals vector creates plus sign shapes of the plotted values for texas schools. The colVals vector creates light grey plots for the texas schools for all rows in the csc2 data.

cexVals <- rep(0.5, nrow(csc2))
cexVals[csc2$region == "texas"] = 1
pchVals <- rep(3, nrow(csc2))
pchVals[csc2$region == "texas"] = 19
colVals <- rep(grey(0.5), nrow(csc2))
colVals[csc2$region == "texas"] <- grey(0.1)

\11. Below I created two vectors to create a subset of the s vector that includes data for Texas colleges. Sub represents Public Texas colleges and sub2 represents Private forprofit Texas colleges.

sub <- s[s$CONTROL == 1, c("UGDS_HISP", "FIRST_GEN", "FIRSTGEN_COMP_ORIG_YR6_RT", "INSTNM", "CONTROL")]
head(sub)

sub2 <- s[s$CONTROL == 3, c("UGDS_HISP", "FIRST_GEN", "FIRSTGEN_COMP_ORIG_YR6_RT", "INSTNM", "CONTROL")]
head(sub2)

\12. Using the plot function, I created a scatterplot of the percentage of first-generation students against the percentage of first-generation students that complete a bachelors degree within six years at a private nonprofit college in Texas. I use the size, shape, and color established in the code above, I labeled the x and y-axis accordingly, labeled according to the names of the schools in Texas, and created a line with a slope of one. The points function creates red points for Public institutions in Texas and blue points for Private forprofit institutions.

plot(tx2$FIRST_GEN, tx2$FIRSTGEN_COMP_ORIG_YR6_RT, col=colVals, pch=pchVals, xlab="PercFirstGen", ylab="FirstGenComp6yr", main="First-Generation Students in Private Nonprofit Colleges in Texas")
text(as.numeric(s[,1]), as.numeric(s[,2]), as.numeric(s[,3])+0.001, labels = s$INSTNM, pos = 1, cex = 0.5)
abline(0,1)

points(sub$FIRST_GEN, sub$FIRSTGEN_COMP_ORIG_YR6_RT, col="red")
points(sub2$FIRST_GEN, sub2$FIRSTGEN_COMP_ORIG_YR6_RT, col="blue")

Warning message in xy.coords(x, y, xlabel, ylabel, log):
"NAs introduced by coercion"Warning message in xy.coords(x, y, xlabel, ylabel, log):
"NAs introduced by coercion"

Scatterplot Argument¶

The scatterplot above shows us that Public Texas Colleges have the highest percentage of first-generation students at around 55% and 63%, but completion rates under 20%. Private forprofit Texas colleges also have a high percentage of first-generation students, but they have a relatively high completion rate for first-generation students ranging from 20%-70%.

\13. The code below creates a vector called logic that creates NA for values that are not a number. The perc vector uses the tapply function that does not include the NAs.

#pg46
logic <- is.na(csc2$FIRSTGEN_COMP_ORIG_YR6_RT)
perc <- tapply(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT[!logic]), INDEX=csc2$region, FUN=mean, na.rm=TRUE)
perc

Warning message in tapply(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT[!logic]), INDEX = csc2$region, :
"NAs introduced by coercion"

\14. I created a new data frame called df_perc using the perc vector in the code above. Then I created a coloumn called region in the new data frame that includes the row names of df_perc and then I create a table of df_perc to see how the data frame looks.

df_perc <- as.data.frame(perc)
df_perc$region <- rownames(df_perc)
df_perc

\15. The logic2 vector below gets rid of the NAs in the perc column in df_perc. The perc column subsetting the logic2 vector changes the NA values to 0.

logic2 <- is.na(df_perc$perc)
df_perc$perc[logic2] <- 0
df_perc

\16. I checked the summary of the percent of first-generation students that complete college within 6 years variable. The hist function creates a histogram with twenty breaks with the x-axis labeled and the creation of a title.

summary(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT))
hist(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT), breaks=20, xlab= "Percent of First-Gen Students", main="First-Gen Completion Rates Within Six Years")

Warning message in summary(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT)):
"NAs introduced by coercion"

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.02766 0.27360 0.35300 0.38000 0.47460 0.85420     195

Warning message in hist(as.numeric(csc2$FIRSTGEN_COMP_ORIG_YR6_RT), breaks = 20, :
"NAs introduced by coercion"

\17. The histogram above shows the spread of the percentage of first-generation students that graduate from college with a Bachelors degree within 6 years. The spread looks relatively normal. Here is a decription of what states are in each region

1 for New England (CT, ME, MA, NH, RI, VT)

2 Mid East (DE, DC, MD, NJ, NY, PA)

3 Great Lakes (IL, IN, MI, OH, WI)

4 Plains (IA, KS, MN, MO, NE, ND, SD)

5 Southeast (AL, AR, FL, GA, KY, LA, MS, NC, SC, TN, VA, WV)

6 Southwest (AZ, NM, OK, TX)

7 Rocky Mountains (CO, ID, MT, UT, WY)

8 Far West (AK, CA, HI, NV, OR, WA)

9 Outlying Areas (AS, FM, GU, MH, MP, PR, PW, VI)

ggplot(csc2, aes(x=factor(REGION2), y=as.numeric(FIRSTGEN_COMP_ORIG_YR6_RT), fill = factor(REGION2))) + geom_bar(stat='identity') +
    labs(x="Region") +
    labs(y="Count") +
    labs(title="Total Number of First-Gen Students Who Complete College in the U.S.")

Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"Warning message:
"Removed 195 rows containing missing values (position_stack)."

/18. The histogram above show that region 5 has the most number of first-generation students complete college within 6 years, while Region 9 has the least amount of first-generation students who complete college within 6 years. This is an interesting observation considering that Region 5 contains AL, AR, FL, GA, KY, LA, MS, NC, SC, TN, VA, and WV.

\19. The code below attempts to get rid of any negative values by setting any percentage less than 0 equal to 0. The creation of the interval vector cuts the perc column into four intervals and prints them below.

df_perc$perc[df_perc$perc<0] = 0
interval <- unique(cut(df_perc$perc, 4))
interval

\20. The next set of code creates breaks from df_perc$perc with the following labels accoriding the intervals creates above.

df_perc$breaks = cut(df_perc$perc, 4, labels = c("0-.132", ".132-.264", ".264-.396", ".396-.529"))
head(df_perc)

\21. chor_df is created to merge the states data with the df_perc data according to region and then prints the first six rows of the data.

choro_df <- merge(states, df_perc, by = "region")
head(choro_df)

\22. Next, choro is ordered and the first six rows are printed.

choro <- choro_df[order(choro_df$order), ]
head(choro)

\23. After the data is cleaned, we are finally ready to plot the data on a map. I used a qplot that uses the longitude and latitude of the choro data and fills the states according to the breaks created earlier. I create a title using main, I border each state so that it is easier to find states, and I use the Spectral palette to color states by various colors.

qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon", 
      main = "College Completion Rates for First-Generation Students") +  borders("state", size = 0.5) +
    scale_fill_brewer(name = "College Completion", palette = "Spectral")

Map Analysis¶

Red = Delaware

Orange = Washington, South Dakota, and Mississippi

Green = Montana, Idaho, Wyoming, North Dakota, Nevada, Utah, Colorado, New Mexico, Texas, Oklahoma, Kansas, Nebraska, Michigan, Maine, New York, Massachusetts, New Jersey, Maryland, Virgina, West Virginia, North Carolina, Tennessee, South Carolina, Georgia, Alabama, and Florida

Blue = Oregon, California, Arizona, Minnesota, Iowa, Missouri, Wisconsin, Illinois, Indiana, Kentucky, Ohio, Pennsylvania, Connecticut, Rhode Island, Vermont, and New Hampshire

I decided to focus my time on analyzing the red and orange states and looking into why states have rates between 0 and 26%. First-generation students tend to be racial minorities, and/or from a low-income family, and often headed by a single parent household. These characteristics make it more difficult for first-generaton students to complete college. Many first-generation students feel pressure to drop out of school because of family problems with money, stress and anxiety, a sense of not belonging, and off-campus employment. It is easier to get to the root of why completion rates for first-generation students, but it is difficult to look at why the low rates are specificly low in certain states.

Conclusion¶

After lookin closely at my data, Delaware does not have any colleges that give out Bachelors degrees. This could be the main reason why the state is seen to have the lowest rate of first-generation students completing college. As far as the oranges states that have completion rates between 13% and 26%, there is enough data in the College Scorecard data for 4-year institutions. The Robert B Miller College in Washington must have pulled the average completion rate at a rate of 53% while Seattle Central College has a completion rate less than 1%, but first-generation students make up 43% of the student population. In South Dakota, Presentation College has 30% of first-generation college students grduate from college. In Mississippi, one out of three colleges did not release information about the percentage of first-generation students who completed college, and Rust College has the lowest percentage of first-generation students to complete college at a rate of 15%. I definitely limited my data by only looking at 4-year institutions, but I think the pecentage averages of each states acurately express each state.

Bibliography¶

Boyd, Vivian S. Linda, K. Gast, Patricia F. Hunt, Alice Mitchell, and Wendy Wilson. "Why Some Students Leave College During Their Senior Year." Journal of College Student Development 53.5 (2012): 737-42. Web.

Riggs, Liz. "First-Generation College-Goers: Unprepared and Behind." The Atlantic, 31 Dec. 2014, http://www.theatlantic.com/education/archive/2014/12/the-added-pressure-faced-by-first-generation-students/384139/. Accessed 7 May 2017.

Wilbur, T. G., and V. J. Roscigno. "First-generation Disadvantage and College Enrollment/Completion." Socius: Sociological Research for a Dynamic World 2.0 (2016): 1-11. Web.

Wolfman-Arent, Avi. "First Year, First Generation: Overwhelmed by demands, buoyed by encouragement." newsworks, 28 Jun. 2016, http://www.newsworks.org/index.php/local/education/94947-first-year-first-generation-seans-spot. Accessed 7 May 2017.

Zinshteyn, Mikhail. "How to Help First-Generation Students Succeed." The Atlantic, 13 Mar. 2016, http://www.theatlantic.com/education/archive/2016/03/how-to-help-first-generation-students-succeed/473502/. Accessed on 7 May 2017.

region	UNITID	OPEID	OPEID6	INSTNM	CITY	STABBR	ZIP	CONTROL	LATITUDE	...	UGDS_HISP	FIRSTGEN_COMP_ORIG_YR6_RT	FIRST_GEN	HIGHDEG	REGION2	long	lat	group	order	subregion
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.46201	30.38968	1	1	NA
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.48493	30.37249	1	2	NA
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.52503	30.37249	1	3	NA
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.53076	30.33239	1	4	NA
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.57087	30.32665	1	5	NA
alabama	102076	103800	1038	Snead State Community College	Boaz	AL	35957-0734	1	34.201247	...	0.0825	0.070063694	0.545154911	2	5	-87.58806	30.32665	1	6	NA

	UNITID	OPEID	OPEID6	INSTNM	CITY	STABBR	ZIP	CONTROL	LATITUDE	LONGITUDE	ADM_RATE_ALL	UGDS_HISP	FIRSTGEN_COMP_ORIG_YR6_RT	FIRST_GEN	HIGHDEG	REGION2	region
8	100812	100800	1008	Athens State University	Athens	AL	35611	1	34.805625	-86.96514	NULL	0.0191	0.579741379	0.471594798	3	5	alabama
11	100937	101200	1012	Birmingham Southern College	Birmingham	AL	35254	2	33.515453	-86.853636	0.533935018	0.0195	0.238095238	0.2	3	5	alabama
13	101073	1055400	10554	Concordia College Alabama	Selma	AL	36701	2	32.42443	-87.023531	0.532846715	0.0373	PrivacySuppressed	0.533477322	3	5	alabama
24	101435	101900	1019	Huntingdon College	Montgomery	AL	36106-2148	2	32.350939	-86.285313	0.583855254	0.0252	0.524137931	0.327559055	3	5	alabama
31	101541	102300	1023	Judson College	Marion	AL	36756	2	32.630526	-87.316127	0.652542373	0.016	0.314285714	0.460580913	3	5	alabama
36	101675	102800	1028	Miles College	Fairfield	AL	35064-2621	2	33.481306	-86.908605	NULL	0.0028	0.193211488	0.42406015	3	5	alabama

	UGDS_HISP	FIRST_GEN	FIRSTGEN_COMP_ORIG_YR6_RT	INSTNM	CONTROL
3648	0.3275	0.402479339	0.382417582	The Art Institute of Houston	3
3659	0.4805	0.573844316	0.630573248	Remington College-Dallas Campus	2
3661	0.3714	0.502638522	PrivacySuppressed	Brazosport College	1
3675	0.1845	0.415	0.302325581	Dallas Christian College	2
3680	0.3629	0.557563242	0.431472081	Career Point College	3
3711	0.2269	0.497285751	0.353021354	ITT Technical Institute-Arlington	3

	UGDS_HISP	FIRST_GEN	FIRSTGEN_COMP_ORIG_YR6_RT	INSTNM	CONTROL
3648	0.3275	0.402479339	0.382417582	The Art Institute of Houston	3
3659	0.4805	0.573844316	0.630573248	Remington College-Dallas Campus	2
3675	0.1845	0.415	0.302325581	Dallas Christian College	2
3680	0.3629	0.557563242	0.431472081	Career Point College	3
3711	0.2269	0.497285751	0.353021354	ITT Technical Institute-Arlington	3
3712	0.322	0.497285751	0.353021354	ITT Technical Institute-Houston West	3

	UGDS_HISP	FIRST_GEN	FIRSTGEN_COMP_ORIG_YR6_RT	INSTNM	CONTROL
3734	0.5169	0.532457496	0.141333333	Midland College	1
4855	0.9401	0.633025431	0.110193974	South Texas College	1

	perc	region
alabama	0.3284414	alabama
alaska	0.5147059	alaska
arizona	0.4110283	arizona
arkansas	0.3376674	arkansas
california	0.4217063	california
colorado	0.2903032	colorado
connecticut	0.4275142	connecticut
delaware	NA	delaware
district of columbia	0.2436604	district of columbia
florida	0.3792571	florida
georgia	0.3201671	georgia
hawaii	0.4316471	hawaii
idaho	0.3713924	idaho
illinois	0.3990149	illinois
indiana	0.4464658	indiana
iowa	0.4935637	iowa
kansas	0.3093231	kansas
kentucky	0.3969361	kentucky
louisiana	0.3692274	louisiana
maine	0.3314711	maine
maryland	0.3530214	maryland
massachusetts	0.3898265	massachusetts
michigan	0.3935320	michigan
minnesota	0.4094383	minnesota
mississippi	0.2181226	mississippi
missouri	0.4137372	missouri
montana	0.3121356	montana
nebraska	0.3647162	nebraska
nevada	0.2879391	nevada
new hampshire	0.4834302	new hampshire
new jersey	0.2701753	new jersey
new mexico	0.3039757	new mexico
new york	0.3443761	new york
north carolina	0.3791234	north carolina
north dakota	0.3344215	north dakota
ohio	0.4142343	ohio
oklahoma	0.3291454	oklahoma
oregon	0.5183400	oregon
pennsylvania	0.5280955	pennsylvania
puerto rico	0.1450674	puerto rico
rhode island	0.4746050	rhode island
south carolina	0.3311112	south carolina
south dakota	0.2543163	south dakota
tennessee	0.3408331	tennessee
texas	0.3419248	texas
utah	0.3476715	utah
vermont	0.5040455	vermont
virginia	0.3901217	virginia
washington	0.2436614	washington
west virginia	0.3353204	west virginia
wisconsin	0.4078479	wisconsin
wyoming	0.3402299	wyoming