Technical debt for data scientists Apr 19, 2019 Technical debt is the process of avoiding work today by promising to do work tomorrow. A team might identify that there’s a small time window for a particular change to be implemented and the only way they can hit that window is to take shortcuts in the development process. They might soberly calculate that the benefits of getting something done now are worth the costs of fixing it later. This kind of technical debt is similar to taking out a mortgage or small business loan. ...
Testing machine learning models with testthat May 1, 2018 Automated testing is a huge part of software development. Once a project reaches a certain level of complexity, the only way that it can be maintained is if it has a set of tests that identify the main functionality and allow you to verify that functionality is intact. Without tests, it’s difficult or impossible to identify where errors are occurring, and to fix those errors without causing further problems. ...
Advice for non-traditional data scientists Aug 29, 2017 I have a pretty strange background for a data scientist. In my career I’ve sold electric razors, worked on credit derivatives during the 2008 financial crash, written market reports on orthopaedic biomaterials, and practiced law. I started programming in R during law school, partly as a way to learn more about data visualization and partly to help analyze youth criminal justice data. Over time I came to enjoy programming more than law and decided to make the switch to data work about three years ago. ...
Why you should work remotely, even if you're not remote May 3, 2017 My last job was as a data scientist at Upworthy, which is a 100% remote company. Prior to starting the position I was worried about whether I could be happy and productive on a remote team. I wondered how project planning would work, whether it would be terribly lonely, and how communication would function when things got hectic. What I discovered is that the company was one of the more efficient and friendly places that I’ve worked, and I think the changes that they have made to accommodate remote work deserve much of the credit. ...
Data Visualization and UI design Apr 13, 2017 Over the past couple of months, I’ve been rebuilding the Shambhala Meditation Timer using React Native and Redux. The idea behind the Shambhala app was to create a kind of modular framework for building meditation timers in order to allow people to create complex timers out of simple components. The three build blocks for a timer are time intervals, gong sounds, and recorded audio contemplations, and the user can stack these building blocks to create whatever kind of meditation session they want. ...
The Power of Tidy Data Feb 16, 2017 Tidy Data Tidy data has become the dominant way of thinking about problems in R. The idea behind tidy data is to develop an ecosystem of R packages which all work around a similar kind of data structure. That way you can easily compose many different tools together to accomplish very complex tasks in an iterative, easy to understand fashion. There are lots of excellent presentations about why this is a great approach but the one I would recommend if you are new to this area is Hadley Wickham’s keynote from the 2017 rstudio conference. ...
R for Excel Users Feb 2, 2017 Like most people, I first learned to work with numbers through an Excel spreadsheet. After graduating with an undergraduate philosophy degree, I somehow convinced a medical device marketing firm to give me a job writing Excel reports on the orthopedic biomaterials market. When I first started, I remember not knowing how to anything, but after a few months I became fairly proficient with the tool, and was able to build all sorts of useful models. ...