On May 1st Austin voted in favor of passing Prop B by a resounding 57% to 43%. Proposition B reinstated the city camping ban which was reversed in May 2019. Many were surprised by the landslide result given how controversial the issue was amongst the local community. Those in favor of Prop B cited public safety and cleanliness as their key reasons to reinstitute the ban. Locals claimed they did not want Austin to ‘become like California’ referring to the significant homeless populations in San Francisco and Los Angeles. …

How to Differentiate Between Categorical and Numeric Variables

When presented a dataset, a typical first step is to explore the variables you will be working with. In a data frame, these are the columns. Sometimes called features, predictor values, or explanatory variables, in their most raw form, these are simply independent or dependent variables. In other words, the characteristics that determine our outcome.

Unfortunately, interpreting these characteristics may not be easy. If the source who generated the data is kind, they have left a description of the variables. …

The purpose of this blog is to provide a general framework for exploring and cleaning data. When tasked with answering a question based on a dataset, many wonder where to begin. While this guide is not meant to be exhaustive, it is intended to provide suggestions and tips for how to begin cleaning data. Once our data is clean, we can begin answering the questions at hand.

Clean data is important for a number of reasons. First, many machine learning and statistical models will not run without it. Second, if our data is complete but inaccurate, our results will be…

Ethan Kunin

