Patrick B Cullinane

5.0

(1 reviews)

US$15.00

For every 15 mins

Sessions/Jobs

First 15 mins free for your first session

ABOUT ME

Data Scientist with experience tackling real-world business problems

I am a self-taught programmer and data scientist with experience in finance, marketing, and the construction industries. I've built machine learning pipelines that using python, spark, and other cloud technologies. I am passionate about natural language processing and in particular topic modeling. I've been writing Python for 3+ years, and I am deeply knowledgable in its applications in data science and data engineering.

Eastern Time (US & Canada) (-04:00)

Joined March 2020

EXPERTISE

Python 3

3 years experience | 1 endorsement

I am a self-taught programmer with experience writing python both in Jupyter notebooks and script. My specialties are data-wrangling with...

I am a self-taught programmer with experience writing python both in Jupyter notebooks and script. My specialties are data-wrangling with pandas, and application of functional programming in machine learning. I am also skilled at object-oriented programming.

Pandas

3 years experience

Machine learning

1 year experience

I have worked applying machine learning to real-world use cases for a year professional and several years before that on my own. I am dee...

I have worked applying machine learning to real-world use cases for a year professional and several years before that on my own. I am deeply skilled with sklearn and the various steps needed to create a viable machine learning pipeline.

Apache Spark

1 year experience

I have worked on cloud-based spark machine learning pipelines.

Django

1 year experience

AWS

1 year experience

Experience working with CloudFront, EMR, S3, and other technologies.

IBM Watson

1 year experience

REVIEWS FROM CLIENTS

5.0

(1 reviews)

Connor James

April 2020

Patrick is excellent, very experienced and good at explaining and resolving problems. Highly recommend

SOCIAL PRESENCE

GitHub

predicting_stock

Jupyter Notebook

Fantasy_football

multidimensional knapsack problem

Python

EMPLOYMENTS

Data Scientist

IBM

2019-10-01-Present

-Create cloud-based machine learning pipelines using spark -Create NLP models for use in network analysis, sentiment analysis, and other ...

-Create cloud-based machine learning pipelines using spark -Create NLP models for use in network analysis, sentiment analysis, and other business applications. -Use optimization techniques such as linear programming to maximize business outcomes for clients.

C++

Pandas

Nltk

C++

Pandas

Nltk

Docker

Python 3

NLP

Neural Networks

Apache Spark

Kubernetes

Data Scientist Intern

BerlandTeam

2019-06-01-2019-10-01

-Create machine learning pipelines for the analysis of social media data. -Conduct marketing-survey analysis to include Factor Analysis, ...

-Create machine learning pipelines for the analysis of social media data. -Conduct marketing-survey analysis to include Factor Analysis, PCA, and other dimensionality reduction techniques.

Python 3

Google Cloud Platform

Python 3

Google Cloud Platform

Business Analyst

Bond Brothers

2017-01-01-2019-06-01

-Analyze project budgets for schedule and cost tracking purposes. -Create data analysis pipelines for use in cost reporting.

Python 3

PROJECTS

Contract AnalyzerView Project

self

2019

The purpose of this project is to take a legal document, like a contract, model the topics and create a pipeline to tag parts of the docu...

The purpose of this project is to take a legal document, like a contract, model the topics and create a pipeline to tag parts of the document with a relevant label. This notebook will focus on the the preprocessing of the data, the topic modeling and the creation of the training set. Ultimately the code in this repo will be useful for people who want to understand a complex legal document such as a credit card agreement more clearly. The data comes from the following link: https://www.consumerfinance.gov/credit-cards/agreements/ The Consumer Financial Protection Bureau (CFPB) collects credit card agreements from creditors on a quarterly basis and posts them at the link above. The CFPB organizes the data by putting each participating company in a directory and then collecting all the statements in a directory for each company. For Q4 of 2018 there are 652 companies and each company has on average 2-4 agreements. For most people contract documents are not fun to read because they are usually written in complex legal jargon and the style of writing is purposely dry so as to spell out worst-case scenarios. That said it is important to understand what you or your business is getting into before signing any sort of agreement. Because it takes a certain type of expertise to understand these documents I feel it would be interesting to see if we can leverage natural language techniques to tag this these documents This repo will enable you to insert a credit card agreement pdf and output labeled sections of the documents to make it easier to read the document. Please see example.ipynb for a walkthrough on how to use this repo. The notebook contract_reader.ipynb has further details on how the repo is constructed.

GitHub

Pandas

Machine learning

GitHub

Pandas

Machine learning

Nltk

Python 3

NLP

Headline AnalysisView Project

self

2019

Overall the dataset contains over 200K headlines from the Huffington Post between 2012 and 2018. The dataset has six columns that capture...

Overall the dataset contains over 200K headlines from the Huffington Post between 2012 and 2018. The dataset has six columns that capture the category, headlines, author, link, description, and date the article was published. Overall there are 40 different categories ranging from politics to education. In general the top categories are politics, wellness, and entertainment. For the purposes of this notebook we won't be using the other columnns but it is worthy noting that each date may have more than one headline. More information about the data can be found below this abstract. The goal of the notebook will be to take the headline column and use topic modeling to recreate the categories. Since we already have hand-labeled category information it will be interesting to see if our models match the ground truth data that we have. To accomplish this we will use non-negative matrix factorization (NMF) to 1) choose the optimal number of topics and 2) associate documents/terms with those topics. NMF is explained in further detail below, but basically it decomposes a document-term matrix into factors by which you can parse document/topics and document/terms from. The project will happen in multiple stages consisting of 1) preprocess the text, 2) create a document-term matrix using tf-idf 3) create the NMF model using the doc-term matrix 4) select the optimal number of topics using word2vec and calculate topic coherence 5) based on the optimal k topics print out top terms, documents and compare to original labels for accuracy.

Python

Pandas

Machine learning

Python

Pandas

Machine learning

Nltk

NLP