As part of the 2019 Charlottesville Women in Data Science (WiDS) Conference, we are offering five concurrent skill sessions on key topics in data science.
Registration for WiDS 2019 will open Feb. 4. Registrants will be able to sign up for skills sessions in March. In order to sign up for a skills session, you must register for WiDS. To stay up-to-date on WiDS programming and receive registration information when it opens, sign up here.
Introduction to Social Network Analysis
Led by Kelsey Campbell (Founder/Data Scientist, Gayta Science: Data Science and Analytics with a LGBTQ+ Focus)
Social Network Analysis (SNA) is a powerful and generalizable method that allows insight into the complicated patterns found within social connection data. This session will introduce SNA for a beginner audience; including how to structure data before building a network, calculating and interpreting basic network statistics, and commonly used tools and technologies. A number of applied examples from across industry and academia, including a personal project focused on LGBTQ+ health, will demonstrate the versatility of SNA. Finally, a hands-on demo will walk through creating, visualizing, and analyzing social networks using industry standard Python packages.
Attendees interested in following along with the demo should bring their laptops with Python 3.x installed (Anaconda Distribution recommended). The Jupyter notebook used in the demo will be made available after the session.
Visual Diagnostics for More Informed Machine Learning
Machine learning is ultimately a search for the best combination of features, algorithm, and hyperparameters that result in the best performing model. Oftentimes, this leads us to stay in our algorithmic comfort zones, or to resort to automated processes such as grid searches and random walks. Whether we stick to what we know or try many combinations, we are sometimes left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this tutorial, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more informed, and more effective.
Data Visualization with Python and Jupyter Notebooks
In this skills session, you will learn how to get started with data visualization using Python and Jupyter notebooks. Learn what it takes to get set up and how to use many popular libraries including matplotlib, pandas, and bokeh to build both static and interactive visualizations. We'll dive in to a variety of public datasets, learn how to clean data, and iteratively build visualizations to better understand the data.
Analyzing unstructured information using text mining in R
Often surveys contain unstructured interview answers to capture wide ranging thoughts of interviewees on topics, purposefully not restricting any answer options to capture free-wheeling thoughts. Using R, this tutorial will walk through the “bag of words” approach, a common natural language processing method, to illustrate how to analyze and discover patterns within free-format text answers easily.