6/1/17 Notes (Problems with merging and data frame size)


  • I checked the shape of the attitudes survey after the duplicates were deleted and all the years have different numbers of columns. Looking at the surveys, I think this is because different questions were asked different years so I will have to type different commands for each year.
  • Trying to append the Attitudes data to the Confidence data
    • The command “ConfidenceS14.append(AttitudesS14)” did not work. I think it didn’t work because this function is for connecting lists and I was trying to connect two data frames.
    • I am going to try this syntax that I found online. This syntax should only add new columns so that there won’t be any repeating columns. This did not work
      • DataFrame.append(other, ignore_index=False, verify_integrity=False)
    • I am going to try a different syntax that is specifically for CSV files
      • import csv with open('data.txt', 'rb') as match_data: reader = csv.reader(match_data) match_data = {tuple(row[:2]): row for row in reader} with open('m_list.txt', 'rb') as match_list, open('done.txt', 'wb') as outfile: reader = csv.reader(match_list) writer = csv.writer(outfile) for row in reader: row = tuple(row) if row in match_data: writer.writerow(match_data[row])
      • Here is what I typed:
import ConfidenceS14

          with open('ConfidenceS14', 'rb') as match_data:
          reader = csv.reader(match_data)
          match_data = {tuple(row[:2]): row for row in reader}

         with open('AttitudesS14', 'rb') as match_list, open('done.txt', 'wb') as outfile:
         reader = csv.reader(match_list)
         writer = csv.writer(outfile)

        for row in reader:
         row = tuple(row)
        if row in match_data:
           writer.writerow(match_data[row])

    • This did not work
    • I found a different method that makes it so you can compare two files line by line, here is the example:
      • file1 = open('some_file_1.txt', 'r') file2 = open('some_file_2.txt', 'r') FO = open('some_output_file.txt', 'w') for line1 in file1: for line2 in file2: if line1 == line2: FO.write("%s\n" %(line1)) FO.close() file1.close() file2.close()
      • It keeps telling me that my files don’t exist so I replaced their shorthand name with the full file name.
      • Then I got an error saying that I needed an integer for FO. Thinking about it, this matches up lines in the files, but not by name so I am going to look for a different command
    • A command called “merge” combines data frames based on a similar id word in both files
    • Originally I tried this format: merge(ConfidenceS14,AttitudesS14,by"First Name",all=True). This did not work
    • Then I tried this format: merged_inner = pd.merge(left=ConfidenceS14,right=AttitudesS14, left_on='First Name', right_on='First Name')
      • This worked but only gave me part of each chart
  • This method for merging can only merge two files but eventually I want to merge all the files together so I will either have to find a different way to merge, or i can continue to merge new files with the already merged file like some kind of python merge-inception.
  • Fixing issue with data frame size.
    • Because the merge function only gives me part of each of the charts, Dr. McColgan suggest I set the data frame size to be large enough to fit all the data.
    • Here is a command I found to create an empty data frame:
      • > df <- data.frame(matrix(ncol = 300, nrow = 100)) > dim(df) [1] 100 300
      • I tried this but I’m getting an error saying that matrix is not defined.
      • I keep getting errors so I am going to try to just display the new file using loc. This displayed more columns but less rows
    • I found a command that stops pandas from having a maximum column number:
      • pandas.set_option('display.max_columns', None)
      • This Worked!
  • I was able to add the semantics data using this function”
ConAttSem14 = pd.merge(left=ConAtt14,right=SemanticsS14, left_on='First Name', right_on='First Name')

merged_inner
# what's the size of the output data?

pd.set_option('display.max_columns', None)
merged_inner.shape
ConAttSem14

    • “ConAtt14” is what I named the file of the merged Confidence and Attitude files.

Comments

Popular posts from this blog

6/14/17 Notes

6/06/17 Notes (formatting code and searching for a specific person)

May 22, 2017 -- SienaSemanticsSurvey -- Code Breakdown -- Cell #8