Posts

Showing posts from June, 2017

6/30/17 Notes (trying to use strip function to get more names to appear)

Still working on similarity function. I’ve asked around, and tried finding a solution online but I haven’t found anything that will merge sequences depending on the sequence matcher ratio so instead I’m going to try to find a function that will edit a list so that it matches another list, then, after the two lists are matching I’ll merge the files. I asked my friend Brooke Hossley for help and she suggested using a trim function to get rid of the spaces after the names which would definitely help make more names appear. The command is .lstrip() here is what I typed: ConfidenceS14['First Name'].lstrip() This did not work. I got and error that says “ 'Series' object has no attribute 'lstrip' ” I googled that error and apparently I need to add “.str” before “.lstrip” This worked! Here is the full command: ConfidenceS14['First Name'].str.lstrip() I’m going to try to search one of the names Torri couldn’t find because of and ext...

06/29/17 Notes (Trying to find a way to merge using sequence matcher)

Still working on similarity function. To use as a reference, I am going to measure the similarity between the names Torri couldn’t get to appear because one name had a space on the end and the other didn’t. That way, I can test to see if the similarity function works. Here are the names I tested and their similarity: Cierra Black, .96 Raynaia Gilchrist, . 9714285714285714 I’m going to use .96 I tried a command that I found online. It uses difflib’s get_close_matches to find comparable items in two strings, then the person joined the two files. Here is what I typed (I didn’t try to merge yet, I just wanted to see which names the code would mark as similar. import difflib difflib.get_close_matches ConfidenceS14['Full Name'].index = ConfidenceS14['Full Name'].index.map(lambda x: difflib.get_close_matches(x, Attendance['Full Name'].index)[0]) This gave me an error

6/29/17 Notes (Getting more names to appear when files merge)

Working on finding out why there are names missing when Torri merges the survey data with the attendance data The S14 data file Torri used has 2013-2014 as the year participated which is correct. However, there are 66 students on this file which would mean that all but one student took the surveys which doesn’t make sense. When I made my own file of 2013-2014 students I also got 66 students... I tried displaying “Combo_” but the code only gave me the column headers and no names. The same thing happened when I tried to display “Combo_2” and “Combo_4” “Final” worked. All students that had a survey number had number had a number for each survey. There are 17 Students that have numbers for the survey data in this file. Apparently the reason why some kids aren’t appearing is that there’s a space after their names in one file but not the other which messes with the merging command, but there are still names missing. I compared the students with survey data on the “Final” ...

6/28/17 Notes (Searching names with apostrophes in them)

I tried the command I used yesterday, but this time using the “Last Name” column because I know the format of it is text because I fixed it earlier yesterday. This did not work. I tried translating As to Ds to see if this command could only translate letters and not symbols. This did not work. I tried making Attend = Attendance[‘Full Name’], then str = “Attend” hoping that this would open up the file I wanted to translate but instead it translated the word “Attend” Tried this command: from string import maketrans    intab = "'" outtab = "_" trantab = maketrans(intab, outtab) print Attendance['Full Name'].translate(trantab) This did not work. Tried “Attendance.translate(trantab)” instead of “Attendance['Full Name'].translate(trantab)” This did not work. I tested making str = “Evan O’Brien” and that worked. I think the problem is that the string has to be typed out and I need a command that can go into a fil...

6/27/17 Notes (Fixing case sensitivity and deleting apostrophes)

Still trying to make the names in the attendance file all uppercase Tried typing the number of the First Name and Last Name columns in between the brackets instead of the column name. This did not work. Tried typing the letter of the First Name and Last Name columns in between the brackets instead of the column name. This did not work. I accidentally typed two commands to make the “First Name” column all caps instead of both the first and last name columns. This worked, I only got an error when I tried to make the last name column all caps which means that there is something wrong with that column. I am going to look through it to make sure there aren’t any numbers in that column. I found a name with parentheses it so I deleted them. This did not work. I tried changing the format of all the last name cells from “general” to “text” this did not work (the format of the first name column is “general” and it works fine) I added a “1” to Bilal so that the cell read “Bila...

6/26/17 Notes (Case sensitivity not working)

Added column names for the Attitudes Survey for all years. Working on getting the sequence matcher to work, with Torri Here is the example Dr. McColgan gave me: >>> from difflib import SequenceMatcher as SM >>> s1 = ' It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.' >>> s2 = ' It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines.' >>> SM( None , s1, s2).ratio() 0.9112903225806451 Here is what I typed to work with my files/code: s1 = CareerS14['Full Name'] s2 = ConfidenceS14['Full Name'] SM(None,s1,s2,).ratio() This gave me a number, but I think it compared the ENTIRE column instead of the individual rows. Added code the code Torri used to change all letters in the full name to capital letters This works for all file except for t...

6/22/17 Notes

I resaved the google master file to my computer so that the code has the updated version of the file I got an error saying the file didn’t exist so I opened the file on my computer and saved it again, making sure it was saved as a csv file. This did not work I copied and pasted the file name into the code to make sure it was correct. This did not work. I renamed the file to “Attendance 2003-2017” (I’m wondering if the parentheses in the original title messed it up). This did not work I restarted Kernel and cleared all outputs. This did not work. I deleted all the versions of this file I had, and redownloaded it. This worked! Working on being able to search names of classes in all the columns Tried using this command: “Studentinfo14[Studentinfo14['Pm S1'+'Am S1'].str.contains('Check', na=False)]” I searched “check” because I know that it appears in both columns. This did not work. I am going to combine all the class columns in...

6/19/17 Notes (fixing the amount of names displayed, and searching a specific class)

Still trying to make all the names appear on the master combined file Tried combining Attendance with the Confidence survey data first (making sure Attendance was the left merge), and then adding the other survey data. This did not work. I’m going to add a cell that searches for names after each cell that makes data frames merge, so I can see when Ella Hubbell, and the other names get lost. They are lost after the first merge, so I need to fix the way I merge the cells I reread my code, and the example code on the left merge tutorial, and I realized that I needed to add this command: how='left', to make it a left merge. This worked! Here’s the full command. ConAttSem14 = pd.merge(left=ConAtt14,right=SemanticsS14, how='left', left_on='Full Name', right_on='Full Name') Now the cell that searches class names is also working! Now I’m working on a way to search multiple columns so that I can display the names of people who took a ce...