Marissa's CS360 Final Project
About Me
Name: Marissa Masangcay
Email: mjmasangcay@usfca.edu
Bio:
I was born and raised in the heart of Silicon Valley in San Jose, CA. I've been surrounded by technology from the get go. I would like to think this is what influenced me to get into the field of Computer Science much later down the line, although I would have never guessed that's what I would be doing when I was growing up.
I ended up stumbling upon the world of Computer Science because my two roommates were working on a start up out of our apartment and it blew my mind that they could build a whole company/product by simply using their laptops. I then took a huge interest in how they could do that and started looking into Computer Science. It turns out I loved learning how to code and then it all took off from there. I applied for to the CS program at USF and 2 years later here I am!
Original Data Set
My data set can be found here.
The licesning for my data set can be found here.
This data set contains the FBI's hate crimes statistics for the year of 2013 in the United States. The data includes the state where the crime took place as well as the type and name of that setting in which it happened in such as in a city or university. As for the reported crimes, the data set breaks them down into seven different categories of hate crimes which are race, religion, sexual orientation, ethnicity, disability, gender, and gender identity. The data set also keeps track of when the crimes happened by the four quarters of the year instead of actual dates. So the data keeps track of how many crimes occurred within each quarter. Over all, the data consists of 15 columns and 1,827 lines.
Data Set Preprocessing
To preprocess my data I used trifacta wrangler. I didn't have very much preprocessing to do, I just had to fill in some missing values and extract the commas out of the population column. The data set remained the same size, I did not get rid of any rows or columns. My transformation script for trifacta preprocessing is as follows:
splitrows col: column1 on: '\r' quote: '\"'
split col: column1 on: ',' limit: 14 quote: '\"'
settype col: Population type: 'Integer'
set col: column_3rd_quarter value: 0 row: empty([column_3rd_quarter])
set col: column_2nd_quarter value: 0 row: empty([column_2nd_quarter])
set col: column_1st_quarter value: 0 row: empty([column_1st_quarter])
set col: column_4th_quarter value: 0 row: empty([column_4th_quarter])
set col: Population value: 0 row: empty([Population])
set col: Gender_Identity value: null() row: empty([Gender_Identity])
set col: Gender value: null() row: empty([Gender])
settype col: Population type: 'String'
replace col: Population on: ',' with: ''
replace col: Population on: '\"' with: ''
replace col: Population on: '\"' with: ''
replace col: Population on: ',' with: ''
settype col: Ethnicity type: 'Integer'
settype col: Religion type: 'Integer'
settype col: Disability type: 'Integer'
settype col: Gender type: 'Integer'
settype col: Gender_Identity type: 'Integer'
set col: Gender_Identity value: 0 row: empty([Gender_Identity])
set col: Gender value: 0 row: empty([Gender])
set col: Gender value: 0 row: mismatched(Gender, ['Integer'])
You can see from my script that I had to change the Population column to a string to be able to remove the commas out of the number. I had to process the commas a couple of times because I couldn't find a replaceAll type of script command.The other crime categories I had to set to integers from strings because trifacta read them in as strings from the csv. Other than filling in empty spaces and extracting out commas I didn't have to do any other preprocessing.
Motivation
I chose this data set because I'm really interested in the social sciences. Although this data set is three years old now I think it's really interesting to see the mind set and actions of the United States general population. You can't refute actual recordings of people's actions so I thought it would be interesting to show how Americans are treating one another. Although it's a grim and somewhat depressing topic to show Americans acting badly I think it's important to raise awareness that hate crimes still do happen today. I think we as Americans like to think that we are now good people and no longer have prejudices, but according to this data set we very much still do. I hope to find what types of hate crimes are most popular today and where the concentrated areas of these hate crimes in general are occurring.
Findings
Symbols Map
Stacked and Normalized Bar Chart
Parallel Coordinates