Data Science = Statistics
It might surprise you that data science is not something new. This has been around way before many of us were even born.
Although use of the term “data science” has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs. In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician. Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”
Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards (displays of current values) rather than papers/reports, as statisticians normally do. Data scientists MUST understand statistics.
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data.In applying statistics to, e.g., a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.
The term “data science” (originally used interchangeably with “datalogy”) has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
Basically the term “Data Science” refers more to the methodolgy, process, context and interpretation and how fast you produce it but the Math for the statistics is still the same.