Posts

Showing posts from May, 2017

5/31/17 Notes (Trying to get a chart appear)

Notes I wrote a list of what I want the new code to do in order to help me think of what order I need to write each command in. The first portion of the new code will be mostly copy and paste commands from the original code, with the addition of the appends code Torri mentioned. I’m going to attempt simply adding the student information to the confidence survey information to make sure that runs smoothly before I try anything fancy. (Things I did that worked are in blue to find them easily for future reference) Define terms Import packages Bring up survey results Delete duplicate survey responses Change formatting of survey results (invert #s of some responses and display questions as column headers) Append student info to new list of survey results so that code matches up identical names Get python to also match the survey year with the sheet that has student info for that year Would it be easier to have the full names of students in one cell instead of...

New Code Cell #5

ConfidenceS14.loc[:,['Timestamp','First Name','Last Name','School','Grade','Gender','Length','I am confident I have the ability to learn the material in STEM','I am confident I can do well in STEM',                          'I think I will do as well as or better than other students in STEM','I do not think I will be successful in STEM','I am confident that I can understand the topics taught in STEM',\                          'I believe that if I exert enough  effort, I will be successful in STEM.','I feel like I do not know a lot about STEM compared to other students in urban scholars.',                          'Compared with other students in this program, I think I have good study skills.','Compared with other students in this program, I do not feel like I am a good student.',\ ...

New Code Cell #4

ConfidenceS14.columns = ['Timestamp','First Name','Last Name','School','Grade','Gender','Length','I am confident I have the ability to learn the material in STEM','I am confident I can do well in STEM',                          'I think I will do as well as or better than other students in STEM','I do not think I will be successful in STEM','I am confident that I can understand the topics taught in STEM',\                          'I believe that if I exert enough  effort, I will be successful in STEM.','I feel like I do not know a lot about STEM compared to other students in urban scholars.',                          'Compared with other students in this program, I think I have good study skills.','Compared with other students in this program, I do not feel like I am a good student.',\ ...

New Code Cell #3

## CLEAN UP SURVEY (1) - Delete duplicates rows and NaN columns ConfidenceS14 = clean_Dupli_NaN(ConfidenceS14) ConfidenceS15 = clean_Dupli_NaN(ConfidenceS15) ConfidenceS16 = clean_Dupli_NaN(ConfidenceS16) ConfidenceS17 = clean_Dupli_NaN(ConfidenceS17) ConfidenceS14.shape

New Code Cell #2

import matplotlib.pyplot as plt import pandas as pd import numpy as np import collections as coll from scipy import stats # Creates a DataFrame with data on the file ConfidenceS14 = pd.read_csv('Confidence and Self-Efficacy in STEM S14 (Responses) - Form Responses.csv') ConfidenceS15 = pd.read_csv('Confidence and Self-Efficacy in STEM-S15 (Responses) - Form Responses 1.csv') ConfidenceS16 = pd.read_csv('Confidence and Self-Efficacy in STEM-S16 (Responses) - Form Responses 1.csv') ConfidenceS17 = pd.read_csv('Confidence and Self-Efficacy in STEM-S17 (Responses) - Form Responses 1.csv') ConfidenceS14.shape

New Code Cell #1

This is what I wrote for the first cell. So far I have been able to make a chart of the students and their survey answers appear. class Survey:     def __init__(self, Fsurvey, name, year, limit, end):         self.Size = Fsurvey.shape         self.Name = name         self.Year = year         self.Students = Fsurvey.loc[:,['Timestamp','First Name', 'Last Name','Name of School','Grade in school','Gender (boy or girl)']]         self.Questions = Fsurvey.columns[0:end]           def list_Students(self):         return self.Students     def list_Questions(self):         return self.Questions # If there are two equal key index data['Last Name', 'First Name'], delete one def clean_Dupli_NaN(vector):      vector = vector.drop_duplicates(['Last Name','First Name'])    ...

Cell # 10, and 11 (Getting the percentages on the legend)

Image
Code This cell repeats the process done in cell 10 and makes a pie chart for every year. “autopct='%1.1f%%'” makes the percentages appear on the pie chart. # describes how many digits in the percentage, f means “float” Deleting this command, and adding the percentages to the title makes them appear on the legend. This code gives graph with legend and percentages: labels = ['Delaware Community School-36.8%','Albany High School-5.3%','Myers Middle School-10.5%','Homeschool-2.6%','KIPP-2.6%','North Albany Academy-2.6%','Pine Hill School-28.9%','Albany School of Humanity-2.6%','Hackett Middle School-5.3%','Homeschooled-2.6%'] sizes = sizes colors = colors patches, text = plt.pie(sizes, colors=colors, shadow=True, startangle=200) plt.legend(patches, labels, loc=[1,0]) plt.axis('equal') plt.title('ATTITUDES SURVEY - SCHOOLS 2014') plt.show(ScountS14) ...

Cell #10

Code ScountS14 = coll.Counter(ConfidenceS14.iloc[:,3]) Finds the data that deals with the student, and what school they attend labels = list(set(ConfidenceS14.iloc[:,3])) n = len(labels) list(set()) deals with sets, I’m assuming this takes the list of # under the labels and groups them as a set len() keeps number of items in a list (number of data entries) sizes = [] for i in range(0,n): sizes.append(ScountS14[labels[i]]) This line gives each “label” (school name) a size based on the percentage of kids that go to that school The size depends on the length of the list set up in the previous line. “Colors” and “explode” change the colors of the pie slices, and dictates whether or not the slice pops in 3D. ( https://matplotlib.org/examples/color/named_colors.html ) Colors for chart. Changing the “startangle” number switches which colors represent which school Deleted the part highlighted in red because before an error was showing up that said “...

May 29, 2017 -- Python -- Append

append() -->   appends/adds a passed object into an existing list Syntax: list . append ( obj ) Have to have a preexisting list to have the new information added to it: Example: aList = [ 123 , 'xyz' , 'zara' , 'abc' ]; aList . append ( 2009 ); print "Updated List : " , aList Prints: Updated List : [123, 'xyz', 'zara', 'abc', 2009] ^ demonstrating how the new information of "2009" was added to the end of the preexisting list In the case of the information with the USP data, it will append information on student survey results to preexisting file including grade, school, gender, and sessions taken while at USP. When successfully completed, the code will properly match a student's survey results to their previously mentioned data by recognizing the same name and appending the survey responses to the end of the original information. Previously made list of student names:    Self.Studen...

Cell #9

This cell creates a chart that shows the question, and the shorthand of that question that is displayed on the data charts. 'Name' displays the labels set up in cell 6. 'Description' displays the questions that were shown in cell 4. Then, the command "display_dat" uses a pandas function to display the shorthand versions of the questions in one column, and the full questions in the column next to it.

Cell #7 and #8

Cell 7 defines terms used in cell 8. Both cells are used to invert the answers to some of the questions. For example: "idx1" is equivalent to typing the command "np.where(vec==1)" The ".where" command is used when different conditions have different effects. Then, ".put" places a value in the chart depending on if the condition set up in the ".where"  command is true or false. For example, in this code, if the answer to a certain question is 1, the number 5 will be displayed instead. Cell 8 then creates a chart with the updated resonses.

Note on errors

If an error pops up even though the correct syntax is being used, try running all the previous cells, and then running that cell again.

Cell #6

This cell creates a chart with the questions as column headers, and the responses in the columns using the command "ConfidenceS#.loc" The ".loc" command is used for displaying labels and data in a chart. This function comes from the pandas package.

Cell #5

Cell 5 changes the names of the columns to the date, name, school name, and then the questions using the command "ConfidenceS#.columns. =" The labels of the new columns are placed after the equal sign. This setup enables the numerical answer to the question to be displayed in the column under the question.

Cell #4

The command "confS#" creates a chart displaying the question, and responses for the terms "School," "Question," and "Students" that were defined in cell #1.

Cell #3

This cell uses the  "clean_Dupli_NaN(vector):" to delete duplicate surveys. "ConfidenceS#" is inserted as the vector. By typing the command "ConfidenceS#.shape" a new data frame is created without the duplicate responses.

Cells 1-2

Cell #1        Terms used throughout the code are defined in this cell.  The terms size, name, year, school, students, and questions are defined. "School" gives the name of the school "Students" gives the first and last names of the students "size" gives the area of the cells occupied by data in the excel sheet "shape" gives (# of rows, # of columns) "list_Students" is the same command as "Students" "list_Questions" is the same command as "Questions" "clean_Dupli_NaN(vector):" erases duplicate survey responses from the same person. Cell #2         Different packages are imported in and renamed. "plt" means matplotlib.pyplot is being used. matplotlib.pyplot creates plots with the data "np" means numpy is being used. Numpy is used for multidimensional arrays of numbers. "pd" means pandas is being used. Pandas structures data and its labels. "coll...

May 24, 2017 -- SienaSemanticsSurvey -- Code Breakdown -- Cell #16 & #17

Graphs only successfully generate when the Gender choice in Cell #10 is "" --> any other choice input generates an error that 'Gender' is not properly defined and can't produce the bar graphs The Graphs divide the data so that the first graph (Graph Part1) produces a graph inclusive of data regarding Science and Math questions from each year's semantics survey. The second graph (Graph Part2) produces a graph inclusive of data regarding Engineering, Technology, and Career questions from each year's semantics survey. _________________________________________________________________________________ Cell #16 "Graph Part1 - Semantics in STEM (Science and Math) for S14, S15, S16, S17" NTitle --> "variable" that the title "SEMANTICS (SCIENCE AND MATH) IN STEM" is saved as and is eventually called in the plot as title=NTitle Calls each year's data for the first ten rows of the tables that were created and all of the...