Dean’s Blog: Seventy

March 27, 2023
Dean Phil Bourne cuts his birthday cake

Our School of Data Science (SDS) team was kind enough to throw me a surprise party for my seventieth birthday this past week. As I wandered among the mainly young faces, many with their families, I could not help but reflect on how lucky I am to work with such folks and that it seemed like only yesterday that I was as they are now. Juggling family and professional life is a whirlwind in which time passes quickly. The need to maintain an appropriate work-life balance has not changed in five decades. On the flip-side the science itself has changed profoundly. I have tried to capture this change in a memoir I am writing titled My Life with Data. Here are a few snippets from across the five decades.

In my twenties I discovered quantitative comparative analysis. To the chagrin of my PhD supervisor I cast aside his project in favor of exploiting a data resource called the Cambridge Crystallographic Database (CSD) which still exists today. Here was a treasure trove of data that was perceived mostly as an archive from which to retrieve a single dataset on small molecule atomic structures. Comparing structures to discover similarities and differences had only just begun. We determined that the structure of caged hydrocarbons were influenced by the chemical properties of the bound halogen. Hey, I thought it was a great discovery at the time. I had a lot to learn. Importantly, it did shape my thinking till today – there is a wealth of data out there, much collected through sweat and toil. It is a travesty not to utilize it to the fullest, often in ways not conceived by the original data generators.

In my thirties I was consumed by computers and programming. For the first time I got to physically touch computers that were not locked away in secure machine rooms. Command line interactive computing with UNIX was changing scientific computing. I wrote books on computing and had a monthly column in a computer magazine called The DEC Professional. Endless hours were spent in assembly level programming of a STAR array processor trying to squeeze enough performance to perform molecular dynamics simulations on relatively small molecules. I should have paid more attention to Moore’s Law (may he rest in peace) and bided my time doing something else.

In my forties we had the beginnings of the human genome project and it was obvious where the future of biology, and soon every other field would lie – analysis of digital data. We had true synergy between computation and experiment – for the first time how we thought about and designed an experiment was driven by what the computer could tell us. Computer analysis was no longer an afterthought. It was a watershed moment and led to the first grant at Columbia University between a life scientist (me) and a computer scientist (Carlton Pu) – he talked about data structures and I talked about molecular structures. With the emphasis on storage, performance and analysis many a happy hour was spent working with object-oriented databases and later relational databases and java enterprise systems. So began a 20 year relationship in developing the Protein Data Bank (PDB) which, I think it is fair to say, became an exemplar for scientific databases. That success was underpinned by how we thought about the quality and representation of that data. Along the way a small group of us pioneers started the field of bioinformatics and my own subfield of structural bioinformatics.

In my fifties attention turned to open science as we started a new open access journal in our field. The interplay between data, methods, outcomes and dissemination is still very much a work in progress. I had a vision of that future back in 2005 and it is still not realized. The desire for open science, and scholarship more broadly, is one of the guiding principles in SDS and I am very proud that all have signed on to that notion. Looking back it now seems that my fifties were a time when attention turned from self to service. Truth be told, being a scientist is very much about self. The whole system is geared towards improving self with the notion that societal benefit happens along the way. Enter the sixties.

In my sixties I left the sunshine of San Diego and cushy life as a tenured professor to become the first Associate Director for Data Science at the NIH. NIH was struggling with the massive influx of digital data across the scales – from molecules to populations. I wanted to serve. The Big Data to Knowledge (BD2K) project was started to address the issue. Most of the attention went to the “Big Data” and not the “Knowledge” as issues of data access, compute models, sustainability and scalability were paramount. The struggle continues.

Now we get to the interesting part, my seventies. Two people, who should know what they are talking about, have recently spoken about a “Prometheus Moment” (Thomas Friedman) and the most important event since the graphical user interface (Bill Gates). That Bill would compare AI to Windows speaks to a certain perspective, but hey he is trying to do good in the world these days. I wrote about my view of this moment, which is more than just AI, and what we are doing about it in a recent blog and I will not repeat it here. It is a magical and defining moment.

As I looked at those young faces at the party, and the incoming students I had talked to earlier that day, I started to wish I was in my twenties again and just starting out. That passed quickly as I thought about the amazing ride I have had thus far and what there is to come. My job now is to be sure those younger than me (aka almost everybody) make the most of this inflection point in our human history and I do my best to pass on the experiences I have had so they can do their best work. Onwards.

Dean Phil Bourne and Data Science Students

Author

Stephenson Dean