header-top

header-mid

header-bottom

Concordia Shanghai



Congratulations Big Data Students!

Eight of our students presented virtually at The 2021 4th International Conference on Big Data and Education (ICBDE 2021) on February 03-05, 2021. Once again, we are the only high school presenting at such international big data conference alongside with graduate students, researchers and professors from all over the world.

 

This year our students did exceptionally well and the following students won Best Presentation Award:

Alex G. on “Tennis Analytics: A New Age of Strategic Matchplay”

 

Cyrus H. on “Data Storage: Big Data and Blockchain”
Check out Cyrus’s video on “Blockchain in 5 mins”, https://www.youtube.com/watch?v=Qb29cleH7OI&t=177s

 

Athena Ru on “Social Capital in United States: In Big Data We Trust ”
Check out Athena’s videos on “IBM Cognos Analytics”, https://www.youtube.com/channel/UCdvhKARGLwZG-tJdCsbzHCA/videos

 

Yena Kim on “Fighting Against Political Corruption with Big Data Analytics”

 


 

Below are all of the abstracts from our presenting students. Please join us in congratulating every student for their immense accomplishment of presenting at this prestigious and advanced conference!

Athena R. on Social Capital in United States: In Big Data We Trust

Social capital refers to networks, shared norms, values, and understandings that facilitate co-operation within or among groups. The term “social capital” first grew to prominence when acclaimed American political scientist Robert Putnam highlighted its significance in his book Bowling Alone: The Collapse and Revival of American Community, in which he stated that social capital needed to be rebuilt in order to restore American community life. Essentially, social capital is a measure of trust in a given society. This study investigates the correlation between social capital and multiple social factors: population, voter turnout, area classification, violent crime rate, religion, ethnic diversity, and community health in the United States. The datasets are derived from PennState Department of Economics, Sociology, and Education, the United States Congress Joint Economic Committee, Dr. Rand Olsen (lead scientist at Life Epigenetics), and the United States of Agriculture. IBM Cognos Analytics was then used to analyze the 13,000+ data entries from 1997 to 2018. The data visualizations from the analytics carried out indicated the change in social capital over time as well as the factors that drove social capital. Notably, IBM Cognos Analytics revealed that the Midwest maintained the highest levels of social capital and areas of higher social capital generally corresponded with lower ethnic diversity. This research is highly applicable to the field of sociology. In a globalized world, no one is truly self-sufficient. As human interaction ultimately shapes society, it is increasingly important to consider how factors such as social capital reflect the current state of affairs.

 

Alex G. on Tennis Analytics: A New Age of Strategic Matchplay

With the new technological innovations that have come with the twenty first century, the idea of big data analytics being able to help athletes achieve better accolades has been explored in multiple areas of the sports world. Most people turn to Steven Levitt’s Moneyball, a study on how analytics built a superior baseball team; however, experts have started to explore analytics in other sports as well. The game of tennis has long been a game of precision and numbers. This study uses big data analytics to analyzes aspects of the game such as service, groundstrokes, and even court surfaces to optimize one’s chances to win a tennis match. This study uses data directly from tennis ATP matches starting from January 2019, which was generated by tennis analytics Jeff Sackmann on Kaggle.com, using up to 20,000 data points collected to explore effects of tennis statistics. IBM’s Cognos Analytics shows that out of the 12 different categories analyzed, break points faced, first serve and second serve points are the most potent in determining the percentage to win a tennis game, surface type also influences these categories. The player’s data from these three categories are capable of predicting the player’s possible win in any game. By analyzing even more data from previous years, this study will be able to generate a complete model to predict a player’s win percentage.

 

Chieh-JU (Jerry) L. on Understanding the Sneaker Resale Market Through Data Analytics

Ever since NBA hall of famer Michael Jordan took the world by storm with his notorious Nike "Banned" Jordan 1s in 1984, sneaker culture has flourished all across the world. In particular, the resale market for sneakers has grown tremendously in the past ten years. In midst of its expansion, the resale market has also become increasingly volatile, with prices fluctuating more than ever. Analyzing over 90,000 data points from online resell platform, StockX, an evaluation is done on the resale market as a whole, with particular emphasis placed on the driving factors for resale value. Using IBM Cognos Analytics in conjunction with traditional means of statistical analysis, this study considers a wide range of components within the resell market, including release quantity, buyer location, and release date. In doing so, this helps identifies three commanding factors of resale value: scarcity, time, and influencers. Holding the strongest correlations with resale price, these three factors have immense control over the sneaker industry. To make sense of these findings, this study performs case studies on a multitude of sneaker models. With further validation and more comprehensive research, the findings could be developed into an effective model which predicts resale values prior to sneaker releases. As the resale market continues to grow, such information will be invaluable.

 

Daisy J. on Educating Middle School Students About Big Data Analytics Through Picture Book

In the digital world we are living in today, data analytics penetrates every aspect of our live – from personalized advertisement to adaptive learning system. Despite its importance in this internet-based world, big data analytics and its education is only accessible to a fraction of high-school and college students. Therefore, it is important to expand education on data analytics to a wider audience – especially middle school students – in order to inform them about the general concepts and applications of big data, which will be extremely useful in preparing them for higher-level big data analytics courses. This project simplifies the basis of data analytics into concise and simple language, and converts them into a picture book that appeals to middle school students. The examples illustrated in the picture book, such as online shopping and online courses, are highly relevant to students’ daily life, and so students can easily understand the application of data analytics in a practical way. Furthermore, this book uses analogies to help students visualize abstract ideas. For instance, it explains the significance of n=all through comparing it to the process of sketching an engineering drawing: the more data there is, the more profound our understanding of the mug is. Although this picture book is a pilot project, it will be spread to broader student groups upon further revision.

 

Yena K. on Fighting Against Political Corruption with Big Data Analytics

Political corruption is universal throughout the globe; it can be seen as variety of types such as bribery, lobbying, and embezzlement. Corruptions have an immense negative impact on peoples’ lives both directly and indirectly in variety of area such as economy and education, thus actions should be made to tackle political corruptions. In order to effectively take actions against corruptions, the degree of corruptions should be measured and corruptions’ relationship between other social factors should be also identified. This study analyzes the correlation between perceived corruption, human development, democratic status, political terror, and average years of schooling to identify the cause and effect of political corruptions. Data for each factor are collected from majority of countries from years 2012 to 2018, and the data are analyzed through visualization with the use of IBM Cognos Analytics. Data visualizations indicate strong correlations. Notable findings are that the higher human development is, and the more people get educated, the lower the degree of perceived corruption is, and that democratically free countries tend to have lower degree of political corruption. These findings could be further applied in the future for people to combat against political corruption: improving education system and ensuring democracy could be the key for us to deal with the consequences of political corruption.

 

Amy L. on Lung Cancer: Big Data with IBM Cognos Analytics on Air Pollution in Spain

Cancer is one of the most common diseases worldwide and the second cause of death globally; within this, lung cancer holds the highest mortality rates out of all. Genes are not the only factor contributing to the risk of getting lung cancer as external factors such as diet and smoking habits also play a major role. Individuals often associate exposure to contaminated air to lung conditions or respiratory tract infections; however, Big Data Analytics has shown a strong, direct relationship between the presence of air pollutants and the risk of lung cancer. By implementing datasets from Kaggle.com and a data table from Sciencedirect.com, this study utilizes IBM Cognos Analytics to successfully illustrate the trends. Datasets from Kaggle.com display the levels of several air pollutants in Madrid, Spain. Eight of the pollutants, namely benzene(C6H6), carbon monoxide (CO), non-methane hydrocarbons (NMHC), nitrogen dioxide (NO2), ozone (O3), particle matter 2.5 (PM2.5), particle matter 10 (PM10), and sulfur dioxide (SO2), were selected as previously proven to affect the human lung the most. The data table from Sciencedirect.com shows the lung cancer mortality rates in Spain from 2001 to 2013. The trends in the level of air pollutants and the lung cancer mortality rates over the 14-year period were compared. These displayed a directly proportional relationship between them. Furthermore, the levels of the eight pollutants at different times of the day were plotted, concluding that most peak around 10 a.m. to 12 p.m. This analysis is useful in determining the optimal time periods of outdoor air exposure.

 

Cyrus H. on Data storage: Big Data and Blockchain

Data loss and corruption is a prevalent problem that plagues Big Data. If files and data are lost, valuable time must be spent to recover those files. Although backups can be made, they may not be up to date. A study revealed that more than half of organizations have lost critical applications or data centers for hours or even days at a time. Worse still, what if data has been tampered by malicious actors, and it goes unnoticed? Organizations can store data in several locations, but then how would they deal with discrepancies in the data? This project looks into a method of data storage that could potentially solve all of these problems: Blockchain. In essence, Blockchain is a distributed form of data storage, whereby multiple computers hold an up-to-date, identical copy of all the data at the same time. Data cannot be edited once added. With Blockchain, one can store, verify and readily access huge amounts of data stored from multiple locations, with no discrepancies amongst each source as they are all synchronized. Regardless of if it’s due to software/network/human error or because of a malicious actor, Blockchain can quickly find and fix errors/changes in the data. Moreover, as long as one computer stays online, the data is accessible and available. Thus, Big Data and Blockchain go hand in hand. The presentation will explain how Blockchain works, how it can be adapted for Big Data, and its benefits and possible applications.