7/5/17 Notes (Adding a column with the year in it and merging all the files together)
Trying to make a column in the confidences14 dataframe that has “2013-14” in each cell.
- I tried a command I found online here is what I typed:
- ConfidenceS14['Year'] = Series(np.random.randn(sLength), index=ConfidenceS14.index)
- I got an error saying “series” was not defined, I’m not sure what I should put there
- I tried putting ‘Year’ where series was and ‘[39]’ where sLength was (# of rows) this gave me an error that said “need integer”
- Tried changing command to this:
- ConfidenceS14['Year'] = 'Year'([2013-14]([39]), index=ConfidenceS14.index)
- I got an error that says “'list' object is not callable”
- I tried a simpler command I found online and it worked!
- ConfidenceS14['Year'] = '2013-14'
Merged ConfidenceS14 with ConfidenceS15 here is the command I used
- Confidence1415 = pd.merge(left=ConfidenceS14,right=ConfidenceS15, on=['First Name','Last Name'], how='outer')
- I used an outer merge so I could keep all data, even the ones that didn’t match up.
When I tried to merge the file with every survey year to the Attendance file I got an error that said “Year Participated” I think this might be because there are too many “Year Participated” columns in the file with all the surveys so instead of merging all the surveys together, then merging that with the attendance, I’m going to merge the survey files for each year with the attendance, then merge the files from each year together to make one file to rule them all.
- I’m still having a problem with “Year Participated” I’m going to restart Kernel and clear outputs, and try again. This worked!
I merged all the files and when I searched for a name, I got the multiples of the same name for the same year so I need to edit the command I used earlier that deletes duplicates for same name and year. This worked!
Comments
Post a Comment