6/3/17 Notes (Trying to get SM to replace misspelled names)
Trying to get the rstrip function to work
- Tried str.rstrip() this did not work
- Tried str.rstrip(“ “) this did not work
- Tried str.rstrip(‘ ‘) this did not work
- Tried str.rstrip ( ) this did not work
- Tried adding lstrip too, this did not work
- I made a practice notebook with two simple csv files. I couldn’t get the strip function to work so instead of having extra spaces at the end of the name I added 8 so I could see if the function was working or not and I found that it was not deleting the 8s
- I tried this command just to see if it would work:
- Str = “Rachel8”
- str.strip(8)
- This worked so I need to find a way to set str equal to the file I want to open.
- I could not find a way to get str to open the file I wanted to so instead I tried this command:
- for line in Space['Full Name'].readlines():
cleaned_line = line.replace(" ","")
cleaned_line
- This did not work
- I tried typing “for line in Space.readlines('First Name'):” This did not work
- Tried using re.sub to replace the xs (I replaced the 8 with x in the names). Here is the command I typed:
- import re
replaced = re.sub('x', '', Space['First Name'])
print replaced
- This did not work. I got an error that says “expected string or buffer”
- Tried adding this command:
- import re
with open(Space) as f
replaced = re.sub('x', '', f)
print replaced
- This did not work
- Dr. McColgan suggested trying this command (g and f refer to the file names in the practice notebook)
- i=0
j=0
for row in f['First_Name']:
for ROW in g['First_Name']:
s1=row
s2=ROW
print s1, s2, SM(None, s1, s2).ratio()
if (SM(None, s1, s2).ratio()>0.7):
row=s2
f.loc[i] = s2
j=j+1
i=i+1
print f
- This compared the SM ratio of each names, but when whe tried to merge the two files not all of the names appeared.
- Then, we tried this:
- i=0
- j=0
- for row in Space['First_Name']:
- for ROW in Nospace['First_Name']:
- s1=row
- s2=ROW
- print s1, s2, SM(None, s1, s2).ratio()
- if (SM(None, s1, s2).ratio()>0.7):
- row=s2
- Space.loc[i,j] = s2
- j=j+1
- i=i+1
- print Space.loc[:,:]
- This worked but gave three extra columns displaying each name once across the three columns.
- The three extra columns were not wanted so we tried this:
- i=0
- j=0
- for row in Space['First_Name']:
- for ROW in Nospace['First_Name']:
- s1=row
- s2=ROW
- i=i+1
- Space.First_Name = Space.First_Name.replace(SM(None, s1, s2).ratio()>0.7, s2)
- Space
- This caused the same problem as the code before it did.
- Then Dr. McColgan tried this:
- i=0
- j=0
- for rows in Spaces['First_Name']:
- print rows
- for rws in noSpaces['First_Name']:
- print 'noSpaces',rws
- if(SM(None,rows,rws).ratio()>0.7):
- print rows,rws,SM(None,rows,rws).ratio()
- Spaces.loc[i,'First_Name']=rws
- print Spaces.loc[i,'First_Name']
- j=j+1
- i=i+1
- Spaces
- This Worked!
Comments
Post a Comment