Data Engineer & DataOps
My LinkedIn
My GitHub
QWERTY keyboard typing with 1 finger: modeling and analysis using Fiit’s law and Zipf’s law
Apache Hadoop for analytics: MapReduce, Pig, Spark, Hive
Collaborative filtering recommender system with PySpark: Amazon products
Ensemble Learning with RandomForest and XGBoost: speed dating data set
Data Pipeline with Python and SQLite for research analysis: final results as a report (sample only)
Statistical testing and fossil analysis: detecting patterns of speciation in time and space
Database model of an university: queries
Database model of an university: SQL DB setup
Database model of an university: class UML and relational models
Dimensionality Reduction: 1000 fashion MNIST
Support Vector Machine: classification of graduation and regression of admission data
Supervised learning: regression and classification of math final grades
Spark Structured API: book reviews and user data
Yelp web scrapping: top barbers in California, CA
Spotify API: Adele and her artwork
Large data set analysis with Dask and Plotly: NYC parking ticket violations
Automate writing JSON data to CSV, and back
Multivariate linear regression: recommended prices for house
World airports, airlines and their routes
Sales project: data cleaning and analysis