Projects

Predicting Dissolved Oxygen Levels with a SARIMAX Time Series Model

Source: Yinan Chen [Public Domain]

Dissolved Oxygen is a good indicator of stream health. Using tested levels of dissolved oxygen in a relatively healthy stream I used SARIMAX TimeSeries Modeling to forecast future levels of dissolved oxygen. This can be used to flag changes in the health of the stream.

SKILLS AND TOOLS USED:
  • Data Analysis: Pandas, Matplotlib, PACF, ACF
  • Time Series Modeling: SARIMAX

Check it out on GitHub

ImageGEO: A Data Collection Tool using APIs and Flask

screen shot 2019-01-22 at 2.53.43 pm

During the recovery phase immediately following a disaster, the Federal Emergency Management Agency (FEMA) performs damage assessment “on the ground” to assess the level of damage caused to residential parcels and to critical infrastructure. To assure an accurate estimation of the damage, it is important to understand the condition of the structures prior to the event.

To help and guide the damage assessment efforts following a disaster and to assist the surveyors to identify the structures of interest, this tool (a web-app or a mobile app) accepts, as an input, a photo, and retrieves address, Zillow price estimate and screenshots of the structures from Google Street View. The tool also includes a damage assessment form, which, in addition to relevant information about the level of damage to the structures, will also provide a pre-event photo of the assessed structure.

SKILLS AND TOOLS USED:
  • API: Zillow, Google Maps, Bing
  • Web App: Flask, Dash
  • Data Collection: Pandas, CSV

Check it out on GitHub

Baking vs Cooking: Natural Language Processing Classification

Using Natural language processing I analyzed the cooking and the baking subreddits and created a Support Vector classification model that predicted if text came from one subreddit or the other with 88% accuracy.

SKILLS AND TOOLS USED:
  • Webscraping and APIs for data gathering
  • Natural Language Processing: Scikit-learn modules CountVectorizer, TfidfVectorizer
  • Machine learning models: Scikit-learn LogisticRegression, Naive Bayes, RandomForestClassifier, Support Vector Machine
  • Training and testing models: Scikit-learn modules Pipeline, GridSearchCV, train_test_split

Check it out on GitHub

Predicting Housing Prices: Multivariable Linear Regression

high angle shot of suburban neighborhood
Photo by David McBee on Pexels.com

Given 79 explanatory variables describing a residential homes in Ames, Iowa can we create a model that accurately predicts house prices Using a lasso linear regression I predicted price of houses in Ames, Iowa with 90% explained variance. I also used regression to input null values.

SKILLS AND TOOLS USED:

  • Data Cleaning: Python, Pandas
  • Plotting: Matplotlib, Seaborn
  • Linear Regression: Scikit-learn

Check it out on GitHub