Predicting Dissolved Oxygen Levels with a SARIMAX Time Series Model

Dissolved Oxygen is a good indicator of stream health. Using tested levels of dissolved oxygen in a relatively healthy stream I used SARIMAX TimeSeries Modeling to forecast future levels of dissolved oxygen. This can be used to flag changes in the health of the stream.
SKILLS AND TOOLS USED:
- Data Analysis: Pandas, Matplotlib, PACF, ACF
- Time Series Modeling: SARIMAX
ImageGEO: A Data Collection Tool using APIs and Flask

During the recovery phase immediately following a disaster, the Federal Emergency Management Agency (FEMA) performs damage assessment “on the ground” to assess the level of damage caused to residential parcels and to critical infrastructure. To assure an accurate estimation of the damage, it is important to understand the condition of the structures prior to the event.
To help and guide the damage assessment efforts following a disaster and to assist the surveyors to identify the structures of interest, this tool (a web-app or a mobile app) accepts, as an input, a photo, and retrieves address, Zillow price estimate and screenshots of the structures from Google Street View. The tool also includes a damage assessment form, which, in addition to relevant information about the level of damage to the structures, will also provide a pre-event photo of the assessed structure.
SKILLS AND TOOLS USED:
- API: Zillow, Google Maps, Bing
- Web App: Flask, Dash
- Data Collection: Pandas, CSV
Check it out on GitHub
Baking vs Cooking: Natural Language Processing Classification
Using Natural language processing I analyzed the cooking and the baking subreddits and created a Support Vector classification model that predicted if text came from one subreddit or the other with 88% accuracy.
SKILLS AND TOOLS USED:
- Webscraping and APIs for data gathering
- Natural Language Processing: Scikit-learn modules CountVectorizer, TfidfVectorizer
- Machine learning models: Scikit-learn LogisticRegression, Naive Bayes, RandomForestClassifier, Support Vector Machine
- Training and testing models: Scikit-learn modules Pipeline, GridSearchCV, train_test_split
Check it out on GitHub
Predicting Housing Prices: Multivariable Linear Regression

Given 79 explanatory variables describing a residential homes in Ames, Iowa can we create a model that accurately predicts house prices Using a lasso linear regression I predicted price of houses in Ames, Iowa with 90% explained variance. I also used regression to input null values.
SKILLS AND TOOLS USED:
- Data Cleaning: Python, Pandas
- Plotting: Matplotlib, Seaborn
- Linear Regression: Scikit-learn