That’s So Second Millennium

Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum

June 22, 2020
  1. At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Paul’s field of geology but also exploring the world’s immersion in what Bill would call a data ecology: It seems every datum is connected, or connectable, to every other datum That word is the original singular form of the plural word “data.”
  2. The growing plethora of data has to be tracked and organized, even though today’s computer hardware doesn’t allow all the world’s data—or even relatively large slices of that data—to be stored and analyzed in one place at one time. Realizing that words are data, too, Paul pointed out that geology encountered a data explosion crisis a few decades ago as science developed enough new names for various rocks to make the new information less useful. That was until geologists produced a plan for sorting out and categorizing rock names according to rocks’ bulk chemistry instead of their constituent minerals (example here). Paul came to see the value of advanced organization in obtaining, thinking, and acting upon  geological data—hence, his pursuit of this certificate in data science.
  3. Discussion of this specific field of science led to the use of various other terms, with various meanings, none of them fully understood by Bill. The terms included informatics, data scraping, the analysis of data clustering, “big data,” and “machine-learning algorithms.” These terms can be anticipated to be influential in nearly all fields, so it behooves the layperson to develop some familiarity with them. It is quite possible to become skeptical of such a body of knowledge and skills that can be used for benevolent or malevolent purposes, like everything. But Paul said the hopeful side of his personality recognizes what data scientists already recognize—namely, that this amazingly powerful field also has its limitation.
  4. He recalled there is an author who currently is writing books with a robust skepticism about machine-learning. Separately, one can get a laugh from the current results seen in the hybrid field of machine-learning poetry. Bill guessed the author was Julia Evans, but it was likely Janelle Shane, the author of You Look Like a Thing and I Love You.
  5. The bottom line is that, as with all science, its tools and results cannot provide their own guidance on how to use wisely the fruits they bear. The guidance must come from external forces driven by human virtue and values.

Liner notes by Bill. Audio editing by Morgan. Cover art for this epsiode was produced by Paul... in conjunction with the Landsat 8 mission, the scikit-learn and seaborn libraries, and Mauna Loa and Kilauea volcanoes. (See his final project slides here.)

Play this podcast on Podbean App