Selected Projects in Data Science
Forecasting Cryptocurrency Prices Using Machine Learning: An Analysis of Reddit Discussions
- By leveraging techniques like Exploratory Data Analysis (EDA), Natural Language Processing (NLP), and Machine Learning (ML) on a big data scale, we aim to uncover hidden patterns and trends that can shed light on the realities of investing in the crypto markets.
- Developed data pipelines for a dataset of 6 million rows in AWS, leveraging SparkNLP for sentiment analysis and time-series prediction using LSTM, ARIMA, and Synapse ML for causal inference to model the sentiment trends and prices relationship.
code
webiste
Quantifying the Complex Relationship between Lyrics, Chord Progression, and Emotion Stimulation Examine the Song Features
- Utilizes Bag of Words and Word2Vec methods, combined with KNN/hierarchical clustering, to identify songs with similar lyrics. Incorporates LDA and sentiment analysis for deeper lyrical understanding. Explores relationships between numerical (tempo, chord progression, key) and textual features using supervised machine learning.
- Trianing machine to understand the connection between elements in music, improving recommendation accuracy using machine leanring techniques and textual features
paper
code
webiste
blog
Build a song similarity calculator to improve the recommendation system
- Features an interface for a song similarity calculator that evaluates both lyrical and numerical attributes of songs. Utilizes cosine similarity for comparison, displaying results through an overlapping radar graph for visual analysis.
whitepaper
code
application
Network Analysis of Twitter to Identify Opinion Leader, Emotional Cascade and Community Structure
- Deployed K-core decomposition to examine the community structure, applied NLP including Name Entity Recognition, Sentimental Analysis and Topic Modelling on tweets to investigate the emotion cascade
paper
code
slides
- Using PCA and UMAP to visualize the participants’ stance on a 2-dimensional map, uses Kmeans to cluster and classify group A and B, and uses centroid coords calculation to get the distance between two groups.
paper
code
slides
Predicting Attitudes towards UBI in EU using Supervised Machine Learning Techniques
- Built and trained Logistic Regression, DecisionTree, SVM, RandomForest, XGBoost and GBDT to identify 5 primary indicators and 35 secondary indicators the key influential factor on the attitude of EU citizen towards UBI
paper
code
slides
Using DID to Evaluate the Impact of Intensive Case Management Services
- Designed DID to construct quasi-experiment setting for causal inference to quantify the impacts
paper
code
Can Remittances Compensate for Parental Absence? Evaluated by the Psychological Well-being and Education Outcome
- Developed a multivariate model in STATA incorporating Difference-in-Difference and Propensity Score Matching
- Estimated the indirect effects and direct impacts of remittances in the context of parental work-related migration on the well-being and academic achievements of left-behind children.
paper
code
George Floyd protests’ impact on the election result
- Employed fixed effects regression trating county’s features as time-invariant variable to examine the influence of protest on election result
paper
code
Review of a Dynamic Model of Housing Demand: Estimation and Policy Implications
- Investigates the problem of household housing demand facing income and house price shocks. Two stage method by Bajari et al. (2013) is used and discrete state dynamic programming and fmin research are numerical methods applied
paper
code
Selected Articles on Policy Suggestion
Congressional Policy towards U.S. Asian Policy
Decentralized Web as Public Sphere
paper
- Your taste, your deepest fear and your dream are, in a sense, shaped by the allocation of your attention, which are disproportionately allocated to the commercial advertisement, where capital intentionally directs. This can be a silent penetration process, as depicted in the The Society of the Spectacle by Guy Debord, where the consumer culture and commodity fetishism would only affect the individual thinking and decision-making in a subconscious way.
paper
What if we have an Digital Agora?
paper
Strategies for Government Relations in US and EU to VP for Governmental Relations at Facebook
paper
Strategies for Green Party to Robert Habeck, Vice Chancellor, Federal Minister for Economic Affairs and Climate Action, Co-leader of Green Party
paper
- The division of opinion over multiple issues in the current traffic light coalition and the raising of Afd, as well as the pervasiveness of anti-globalization, populism and polarization could impose further difficulty for Greens to maintain its position in the coalition. Considering the current global economic recession and democracy deconstruction, Greens should be prepared to forgo some universal basic values and adopt more realistic foreign policies, and advocate for strengthening national power, with the emphasis on its representativeness of future and hope.
Strategies to Constrain the Cigarette and Vaping Industries to the President of the United States, Joseph Biden
paper
- This memorandum suggests using COVID-19 as opportunity window, reframing the debate as the movements to greater ethnic equity and realization of national value, building advocacy coalition and discrediting the front group as three- pronged political strategies to constraint the cigarettes and vaping strategies.
Page template forked from evanca