OT Data science lessons I have learned over several decades

Art Kendall
For discussion, Clarifications? Omissions? etc.

* There are many good reasons why exercises would use clean data sets.
However, IMO, teachers should mention repeatedly things like these other

* A major purpose of stat is to provide solid arguments and understanding in

* Real data rarely arrives clean in many contexts. The time and effort for
cleaning and prepping data are often vastly more than actually doing

* The data definition should be as complete and communicative as practical
before any analysis is attempted.

* Maintaining the distinction between system-missing and user-missing values
is very beneficial. Interpret a system-missing value as "the writer needs to
redraft syntax.

* Missing data causes problems in reasoning.
The best time to minimize missing data problems is when designing the data
gathering instruments and administration.

* A major benefit of intro classes is to facilitate working with specialists
at design and implementation phases when it comes to actual projects.

* The GUI is great at drafting syntax.

* Like any other writing re-drafting is an iterative process.  With time one
gets further into understanding data science in general and into the
particular data of a project.  Computers do what one *says*. Syntax needs to
be re-drafted until it does what one *means*.

* Syntax is a communication language. One gets better at it with experience.
Syntax needs to be clear to the software, the writer, and to other people.

* In today's computing environments,  costs for personnel time are often a
more important consideration than costs for computer resources.

