6/3/17 Notes (Trying to get SM to replace misspelled names)

Trying to get the rstrip function to work
  • Tried str.rstrip() this did not work
  • Tried str.rstrip(“ “) this did not work
  • Tried str.rstrip(‘ ‘) this did not work
  • Tried str.rstrip ( ) this did not work
  • Tried adding lstrip too, this did not work
  • I made a practice notebook with two simple csv files. I couldn’t get the strip function to work so instead of having extra spaces at the end of the name I added 8 so I could see if the function was working or not and I found that it was not deleting the 8s
  • I tried this command just to see if it would work:
    • Str = “Rachel8”
    • str.strip(8)
    • This worked so I need to find a way to set str equal to the file I want to open.
  • I could not find a way to get str to open the file I wanted to so instead I tried this command:
    • for line in Space['Full Name'].readlines():
   cleaned_line = line.replace(" ","")

cleaned_line
    • This did not work
    • I tried typing “for line in Space.readlines('First Name'):” This did not work
  • Tried using re.sub to replace the xs (I replaced the 8 with x in the names). Here is the command I typed:
    • import re
replaced = re.sub('x', '', Space['First Name'])
print replaced
    • This did not work. I got an error that says “expected string or buffer
    • Tried adding this command:
      • import re

with open(Space) as f

replaced = re.sub('x', '', f)
print replaced
      • This did not work
  • Dr. McColgan suggested trying this command (g and f refer to the file names in the practice notebook)
    • i=0
j=0
for row in f['First_Name']:
   for ROW in g['First_Name']:
       s1=row
       s2=ROW
       print s1, s2, SM(None, s1, s2).ratio()
       if (SM(None, s1, s2).ratio()>0.7):
           row=s2
           f.loc[i] = s2
           j=j+1
           
 
           
   i=i+1
print f

    • This compared the SM ratio of each names, but when whe tried to merge the two files not all of the names appeared.
  • Then, we tried this:
    • i=0
    • j=0
    • for row in Space['First_Name']:
    •    for ROW in Nospace['First_Name']:
    •        s1=row
    •        s2=ROW
    •        print s1, s2, SM(None, s1, s2).ratio()
    •        if (SM(None, s1, s2).ratio()>0.7):
    •            row=s2
    •            Space.loc[i,j] = s2
    •            j=j+1
    •            
    •  
    •            
    •    i=i+1
    • print Space.loc[:,:]
    • This worked but gave three extra columns displaying each name once across the three columns.
  • The three extra columns were not wanted so we tried this:
    • i=0
    • j=0
    • for row in Space['First_Name']:
    •    for ROW in Nospace['First_Name']:
    •        s1=row
    •        s2=ROW
    •       
    •      
    •  
    •            
    •    i=i+1
    •    
    • Space.First_Name = Space.First_Name.replace(SM(None, s1, s2).ratio()>0.7, s2)

    • Space
    • This caused the same problem as the code before it did.
  • Then Dr. McColgan tried this:
    • i=0
    • j=0
    • for rows in Spaces['First_Name']:
    •    print rows
    •    for rws in noSpaces['First_Name']:
    •        print 'noSpaces',rws
    •        if(SM(None,rows,rws).ratio()>0.7):
    •            print rows,rws,SM(None,rows,rws).ratio()
    •            Spaces.loc[i,'First_Name']=rws
    •            print Spaces.loc[i,'First_Name']
    •            
    •        j=j+1
    •    i=i+1
    • Spaces
    • This Worked!

Comments

Popular posts from this blog

6/14/17 Notes

6/06/17 Notes (formatting code and searching for a specific person)

May 22, 2017 -- SienaSemanticsSurvey -- Code Breakdown -- Cell #8