My Bui (Mimi)

Data Engineer & DataOps

My LinkedIn
My GitHub

Mimi’s Hobby DE Work


QWERTY keyboard typing with 1 finger: modeling and analysis using Fiit’s law and Zipf’s law

Apache Hadoop for analytics: MapReduce, Pig, Spark, Hive

Collaborative filtering recommender system with PySpark: Amazon products

Ensemble Learning with RandomForest and XGBoost: speed dating data set

Data Pipeline with Python and SQLite for research analysis: final results as a report (sample only)

Hypothesis testing of differences in sample means using 1000 permutation and t-test: leadership analysis

Statistical testing and fossil analysis: detecting patterns of speciation in time and space

Database model of an university: queries

Database model of an university: SQL DB setup

Database model of an university: class UML and relational models

Dimensionality Reduction: 1000 fashion MNIST

Support Vector Machine: classification of graduation and regression of admission data

Supervised learning: regression and classification of math final grades

Spark Structured API: book reviews and user data

Yelp web scrapping: top barbers in California, CA

Spotify API: Adele and her artwork

Large data set analysis with Dask and Plotly: NYC parking ticket violations

Reddit API: Subreddit Python

Automate writing JSON data to CSV, and back

Multivariate linear regression: recommended prices for house

World airports, airlines and their routes

Income and religion in the US

Sales project: data cleaning and analysis

Most popular posts and golden hour of publishing comments

Gender gap STEM degrees

Titanic gender vs. survival

Theaters and ticket systems UML