Institute Hosts “Computational Tools for Data Science” Student Poster Session

The Hariri Institute hosted a student poster session this Wednesday, December 14th from 1-5PM, featuring 35 group projects completed by students enrolled in the “Computational Tools for Data Science” (CS 505) fall course taught by Professor Crovella.

Starting at 1PM, students from CS 505 presented posters describing data-science experiments studies they have conducted. The purpose of the course project was to analyze a real data set using python to generate predictions and draw conclusions from trends they discovered in their data. Course projects dealt with datasets derived from companies and non-profit organizations such as Amazon, Hubway, NFL, and U.S Equal Employment Opportunity Commission. Projects included, using data to try and predict bias in Amazon consumer reviews, job type based on gender and type of Hubway customer based on the user’s incoming cycling speed. The following excerpts highlight several of the course projects displayed at the poster session.

Juniors and Computer Science Majors, Brian Siao Tick Chong and Freddie Vargas, used data-visualization software to better understand how GitHub communities are structured online. Specifically, they wanted to focus on, “who is in the communities and what their focus is”. Through their data science experiment, the pair was able to discover that the communities on Github are large and well-connected. This is contrast to their original prediction that there would be more autonomous users. Through their research, the pair was also able to find that there are anomalous users in the Github community that are highly influential due to their abnormally high amount of connections to others users and projects. The pair noted that their favorite part of the project was being able to work with such a large amount of data, “usually you don’t to work with a lot of data” Chong explained.
dsc03649-1
Seniors and Computer Science Majors, Kai Bernardini and Sandra Lund Lefdal, used an Amazon dataset to build a sentiment analyzer that could determine the level of bias in an Amazon review. Their sentiment analyzer, which used contraction and classification methods, was able to create a model that was 90% accurate in predicting the bias of consumer reviews. To exemplify how their project could be applied to other disciplines, the pair also used the sentiment analyzer determine the bias of Donald Trump’s tweets, “You can use a sentiment analyzer to determine whether any arbitrary text is overly positive or negative,” Bernardini explained.

Seniors and Computer Science Majors, Sarah Medeiros and Samantha Karam, ran linear regression algorithms on their data collected from the U.S Equal Employment Opportunity to find correlations between race, gender, and job type based on skill. Through their experiments, they were able to conclude that one can accurately predict job type based on race and gender, but not on year. The two women hope to continue their research to better understand the inequalities between job opportunities in regard to race and gender, “I just wish I could better understand why, these disparities exist,” said Madeiros.

dsc03651