OT Data science lessons I have learned over several decades
For discussion, Clarifications? Omissions? etc.
* There are many good reasons why exercises would use clean data sets.
However, IMO, teachers should mention repeatedly things like these other
* A major purpose of stat is to provide solid arguments and understanding in
* Real data rarely arrives clean in many contexts. The time and effort for
cleaning and prepping data are often vastly more than actually doing
* The data definition should be as complete and communicative as practical
before any analysis is attempted.
* Maintaining the distinction between system-missing and user-missing values
is very beneficial. Interpret a system-missing value as "the writer needs to
* Missing data causes problems in reasoning.
The best time to minimize missing data problems is when designing the data
gathering instruments and administration.
* A major benefit of intro classes is to facilitate working with specialists
at design and implementation phases when it comes to actual projects.
* The GUI is great at drafting syntax.
* Like any other writing re-drafting is an iterative process. With time one
gets further into understanding data science in general and into the
particular data of a project. Computers do what one *says*. Syntax needs to
be re-drafted until it does what one *means*.
* Syntax is a communication language. One gets better at it with experience.
Syntax needs to be clear to the software, the writer, and to other people.
* In today's computing environments, costs for personnel time are often a
more important consideration than costs for computer resources.
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
For a list of commands to manage subscriptions, send the command