Decision trees are a powerful tool capable of mapping non-linear relationships and complex relationships between many variables. They are also computationally inexpensive compared to other algorithms. There are many forms of decision trees, but I am going to just cover the basics of the algorithm in this post.

Decision trees…


In this post I will talk about the assumptions for linear regression. Some must be checked before modeling begins. Others need to be checked afterward using the residuals obtained from training. …


When conducting statistical tests, many of them require that the values of the samples are normally distributed. Often times they are, unfortunately, not normally distributed, but there is a nifty trick to fix this problem. The solution involves what is called ‘The Central Limit Theorem’.

The central limit theorem states…


One of the most sought after skills in data analytics is to be able to tell a story with data. This is why you used the data visualizations, statistics, and machine learning, it was to influence business decisions. There should be plenty of domain knowledge put to use and you…


Cross validation is an alternative to using a validation set for machine learning models. There are several ways using cross validation instead of a stand alone validation set can improve the performance of your models. …


Using just python and numpy

Why use K-Neighbors?

K-Neighbors is a good at modeling complex non-linear relationships. It is a supervised form of machine learning and is different from linear and tree based algorithms because it is distance based.

How Does K-Neighbors Work?

K-Neighbors estimates values by examining nearby data points. A K-Neighbors model works by first…


By George Bennett

Pandas is a library in python that helps with organizing and exploring data. It is built on the numpy library. Pandas stores data in neat tables called “DataFrames” that can be easily manipulated. I will be explaining how to create dataframes, how to quickly get information from…


By George Bennett

If your learning python you will quickly become familiar with the list datatype. Lists are an ordered collection of data, whether it be numbers, strings, collections, or any other objects. Numpy arrays are similar to lists but when dealing with numerical information they simplify mathematical processes. …


By George Bennett

When using machine learning algorithms it is often a good idea to first scale the data. Scaling the data is putting all the features on a level playing field. Say you are using a distance based algorithm and you have one feature with a range in the…


By George Bennett

Whether you are using tableau, SQL, or excel, or google sheets. It is a good Idea to know how to make joins.

Joins are used when you have two tables containing information that can be linked together. Many times in databases there will be identifier columns which…

Datascience George

Data scientist learning at Flat Iron School

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store