6/1/17 Notes (Problems with merging and data frame size)

I checked the shape of the attitudes survey after the duplicates were deleted and all the years have different numbers of columns. Looking at the surveys, I think this is because different questions were asked different years so I will have to type different commands for each year.
Trying to append the Attitudes data to the Confidence data

The command “ConfidenceS14.append(AttitudesS14)” did not work. I think it didn’t work because this function is for connecting lists and I was trying to connect two data frames.
I am going to try this syntax that I found online. This syntax should only add new columns so that there won’t be any repeating columns. This did not work

import csv with open('data.txt', 'rb') as match_data: reader = csv.reader(match_data) match_data = {tuple(row[:2]): row for row in reader} with open('m_list.txt', 'rb') as match_list, open('done.txt', 'wb') as outfile: reader = csv.reader(match_list) writer = csv.writer(outfile) for row in reader: row = tuple(row) if row in match_data: writer.writerow(match_data[row])
Here is what I typed:

import ConfidenceS14

with open('ConfidenceS14', 'rb') as match_data:

reader = csv.reader(match_data)

match_data = {tuple(row[:2]): row for row in reader}

with open('AttitudesS14', 'rb') as match_list, open('done.txt', 'wb') as outfile:

reader = csv.reader(match_list)

writer = csv.writer(outfile)

for row in reader:

row = tuple(row)

if row in match_data:

writer.writerow(match_data[row])

This did not work
I found a different method that makes it so you can compare two files line by line, here is the example:

file1 = open('some_file_1.txt', 'r') file2 = open('some_file_2.txt', 'r') FO = open('some_output_file.txt', 'w') for line1 in file1: for line2 in file2: if line1 == line2: FO.write("%s\n" %(line1)) FO.close() file1.close() file2.close()
It keeps telling me that my files don’t exist so I replaced their shorthand name with the full file name.
Then I got an error saying that I needed an integer for FO. Thinking about it, this matches up lines in the files, but not by name so I am going to look for a different command

A command called “merge” combines data frames based on a similar id word in both files
Originally I tried this format: merge(ConfidenceS14,AttitudesS14,by"First Name",all=True). This did not work
Then I tried this format: merged_inner = pd.merge(left=ConfidenceS14,right=AttitudesS14, left_on='First Name', right_on='First Name')

This method for merging can only merge two files but eventually I want to merge all the files together so I will either have to find a different way to merge, or i can continue to merge new files with the already merged file like some kind of python merge-inception.
Fixing issue with data frame size.

Because the merge function only gives me part of each of the charts, Dr. McColgan suggest I set the data frame size to be large enough to fit all the data.
Here is a command I found to create an empty data frame:

> df <- data.frame(matrix(ncol = 300, nrow = 100)) > dim(df) [1] 100 300
I tried this but I’m getting an error saying that matrix is not defined.
I keep getting errors so I am going to try to just display the new file using loc. This displayed more columns but less rows

ConAttSem14 = pd.merge(left=ConAtt14,right=SemanticsS14, left_on='First Name', right_on='First Name')

merged_inner

# what's the size of the output data?

pd.set_option('display.max_columns', None)

merged_inner.shape

ConAttSem14

“ConAtt14” is what I named the file of the merged Confidence and Attitude files.

SienaSTEMdata2017