My Bui (Mimi)

Biz Dev who also does DS and ML
Torture data to see the truth
Never being either just lucky or wrong

My CV
My LinkedIn

#python, #scala, #java, #sql, #nosql
#spark, #dask, #mongodb, #postgresql, #ssis
#scikitlearn, #tensorflow, #keras
#aws, #salesforce, #sap, #qlik, #powerbi, #tableau

Mimi’s Portfolio


1. Machine Learning

Collaborative filtering recommender system with PySpark: Amazon products

Ensemble Learning with RandomForest and XGBoost: speed dating data set

Dimensionality Reduction: 1000 fashion MNIST

Support Vector Machine: classification of graduation and regression of admission data

Supervised learning: regression and classification of math final grades

Multivariate linear regression: recommended prices for house


2. Data Science

Spark Structured API: book reviews and user data

Yelp web scrapping: top barbers in California, CA

Spotify API: Adele and her artwork

Large data set analysis with Dask and Plotly: NYC parking ticket violations

Reddit API: Subreddit Python

World airports, airlines and their routes

Income and religion in the US

Sales project: data cleaning and analysis

Most popular posts and golden hour of publishing comments

Gender gap STEM degrees

Titanic gender vs. survival


3. Data Engineering

Data Pipeline with Python and SQLite for research analysis: final results as a report (sample only)

Automate writing JSON data to CSV, and back

Database model of an university: queries

Database model of an university: SQL DB setup

Database model of an university: class UML and relational models

Theaters and ticket systems UML


[To be continued with upcoming DS and ML projects]