2020 Presentation Descriptions

You must be a registered member of the HEDW and properly logged in to view this content

This is an alphabetical listing of the member presentations due to be given at the 2029 HEDW Conference in Ogden, Utah. The formal Conference Agenda, listing specific times and locations for each presentation, will be available as the conference draws nearer.

Member Presentation Descriptions

A Faculty/Staff Data Model for IR 
Becca Baggett and Liz Conder, University of Utah

The University of Utah IR office is well into the implementation of a faculty/staff reporting architecture that has been a game changer in our ability to address the unique and highly complex reporting requirements of HR data in Institutional Research. Charged with generating faculty and staff reports for the state system, IPEDS and other national surveys, while serving as the reporting arm for University leadership planning and assessment, we need the ability to reconcile and choose winners from multiple data points, roll up, slice and dice data in many and imaginative ways, and deliver year-over-year analyses against the same.     The core foundation of our new structure was rolled out in late 2019. In this session we will share the structure of our data model, key data transformations decisions that have had a profound impact on our ability to generate reliable figures quickly and consistently, and demonstrate the data profiling techniques that drove our requirements analysis forward and secured important buy-in and trust from our reporting division. We’ll share what’s working, and candidly discuss what isn’t. If time permits we’ll take a look at what comes next on our roadmap.

A Student Touch-Point Universe Revisited
Mick Haney, Nevada State College

This presentation will expand on the framework presented at HEDW2019 for integrating student data from admissions to graduation and everything in between. We will briefly revisit the framework levels while introducing new concepts of supporting Header and Trailer years, and going down a level to illustrate how we support Student Program and Plan information over time. We will technically illustrate how to populate the framework. We will then technically discuss how the framework drives the population and enhancement of an integrated student database. The presentation will close with a technical discussion (and SQL Samples) on how the framework design is utilized to support retention and graduation rate analysis for any identifiable COHORT. This presentation will be of interest to Data Architects, Persons interested in creating a Data Driven Environment, and database developers.

Accelerating Analytics Across the Enterprise
Pramod Kunju, University of California – Irvine

Analytics across disparate data repositories has been an ongoing challenge in every organization. This usually involves moving data from multiple applications to a single data stores – which is costly from a data movement, and modeling perspective.    There are several innovative solutions to this problem. One of the effective solutions is using Data virtualization. The session will cover use cases where this solution is appropriate, including implementation plan, and best practices.    According to Gartner, “By 2020 Organizations Utilizing Data Virtualization Will Spend 45% Budget, Less Than Those Who Do Not, on Building and Managing Data Integration Processes for Connecting Data Assets.”    Attendees will learn many techniques for Data analytics across functional areas, including:  1) Quick and efficient proto-typing without the need for expensive ETL  2) Creating virtual single source of truth for the whole enterprise  3) Strategies to address short-term ad hoc information needs

Admission Application Weekly and Daily Snapshots
Hairong Liu, University of Missouri System

The project is to do trend analysis in various levels and categories regarding applications, admits, accepts, and enrollment. A flat table has been created to capture application details on a daily basis, appended on a weekly basis at exact same time.  The president’s office and the IR office uses this table heavily through hundreds of reports.  There are trend analysis from week to week on application/admission/accept counts, trend analysis compared to past years, counts between Residency and Non Resident students, counts in a more detailed level such as First Time College, transfers, graduate, and professionals, details on each individual applicant and application.  An aggregated table has been created for reports to improve performance.  Campus specific WEEKLY/DAILY/CENSUS_DATE views have been created for access control among different roles.  The tables are accessible on an Oracle database, and also loaded to Cognos for users to generate and run their own customized reports.  The table includes but not limited to term info, application info, applicant demographic data and financial aid info, applied academic data, test scores.  Those data points can be used for aggregated reports.

Advanced SQL for Warehousing and Analytics
Rochelle Smits-Seemann, Salt Lake Community College

Data and code will be made available to attendees to load into their own database instances, or using Oracles free-tier cloud database (https://livesql.oracle.com). Using sample higher education data we will discuss SQL order of operations and how that affects query performance, analytic functions, and how SLCC applies these concepts in data warehouse development or business intelligence reporting. Topics that will be covered include windowing (analytic) functions (e.g., rank versus dense rank), pivot/unpivot, date conversions and duration calculations. Attendees should already have some familiarity with SQL.

Analyzing the Costs of Your Programs: It’s All About the Model
Adam J Raab, Embry-Riddle Aeronautical University

Revenue. Expenses. Margin. These terms are often considered anathema to the academy, but in today’s hyper-competitive market for students and funding, they are more important than ever. At the center of this issue is the cost of teaching. Over ten years ago, Embry-Riddle Aeronautical University endeavored to discover just what their academic programs were costing them. It was determined that a model was needed to combine student and financial data in a way that would result in actionable unit cost of production information, usable by both academic and administrative leadership.   This presentation will introduce participants to this model and explain how it functions. This will be a technical presentation, and will cover topics ranging from data sourcing to computational methods. Context will be provided in regards to the models that are available commercially and how they compare to the in-house model being presented. The intention of this presentation is to give participants something to work from, and to set them on the right track towards their own model. This presentation is intended for intermediate practitioners who are knowledgeable about their data and how to work with it, but may not understand program costing methodologies.

And that’s a RAP……How the University of Washington is transforming a 1,000+ Workday report inventory in short order!
Karen Matheson, University of Washington

In 2017, the University of Washington for the first time implemented a SaaS based enterprise system by replacing its 35-year-old legacy HR Payroll System with Workday. During our first year with Workday, we quickly learned about Workday’s special brand of technical report and system configuration tools. This provided a backdrop to come up with report development and maintenance best practices. However, we discovered significant technical debt from before go-live, which contributed to user frustration and lack of campus report utilization in general. This session will share the highlights of our Reporting Adoption Project (RAP) strategy, which aims to address the technical debt, as well as engage report stakeholders in working groups and testing. The end result of this project will culminate in a complete revision of the 1,000+ report inventory.    During the presentation we will cover our methodology for engaging campus stakeholders through surveying, focus group and report testing, revising our Workday security model, creating a training and outreach plan, and technical report inventory revision execution.

API Directory adoption at University of Michigan
Kranthi Bandaru, University of Michigan

This presentation will cover the adoption of the API Directory at University of Michigan.   We will cover the following topics:  – Business and technical needs, along with use cases for implementing APIs.  – Advantages of converting traditional applications to APIs.  – An overview of the University of Michigan’s API directory implementation.  – Additional features such as subscription levels, throttling rates, auditing, and data governance.  – How to overcome resistance from data owners and make them a part of the API journey.  – What’s in the future for the API Directory.  – How APIs can help your cloud migration strategy.

Automating dashboard updates via Google BigQuery
Matt Baxter and Rochelle Rachelle Smits-Seemann, Salt Lake Community College

We will begin by giving an overview of our systems used to bring data from our Oracle warehouse to Google BigQuery, and from there to Tableau. We will discuss the reasons why we chose to use Google BigQuery and how we have automated this workflow to update business intelligence dashboards. We will also cover cost calculation and optimization on Google BigQuery and how we make sure our data is secure on the cloud. We will conclude with a short demonstration, and will provide code examples. Attendees will leave with knowledge of how to begin setting up a cloud environment and automating data pipelines for dashboards.

BI Strategy: The Business of Intelligence at New York University
Addis Crooks-Jones and Andrew Maliszewski, New York University

1.  Introductions  About NYU  NYU IT     2.  Data warehousing through the years    3.  BI through the years  Where we describe the BI tools & technologies leveraged  Why we abandoned some things  Why we held on to others (for better or worse)  Panaceas and Pitfalls     3.  Evolution of Project and Product Management   Who we manage our workstream and Product Lifecyle  How and why agile?  How we have structured our teams    4. Better, Bolder Strategies  Biggest pivots in our strategy over the last 10 years    5. The Future  What we are working on now in the lab  Vexing problems that we hope to solve

Building a Data Warehouse for Workday Student
Mike Salisbury, University of Rochester

The University of Rochester has recently become the 1st R1 tier institution to go live with Workday Student (branded UR Student). As part of our UR Student program, we have also replaced our current student data warehouse with a redesigned UR Student data warehouse with integrations to Workday Student. This presentation will discuss our strategy, evaluation of Workday tools, implementation approach, and architecture as well as our hard-earned insights into the challenges, critical success factors and best practices. If your institution is planning to transition to Workday Student in the future, come and learn from our experiences.

Building a SAS Admissions data model including dashboards, reports and data dictionary
Nadia Mankins, University of North Texas

Admissions can be a complex subject as it covers many areas such as test scores, high school and college information, events attended, housing application, orientation, different application processes between INTL, UGRD and GRAD office, etc. My goal with this presentation is to provide you with useful tools and examples, so you can complete a successful SAS data model within a year, all these including data validation, dashboards/reports creation and data governance documentation. I will share my experience on how helpful it is to have an organized and comprehensive worksheet with all the tables and fields/indexes mappings, fields length, calculated fields, etc., this will save a lot of time when identifying the fields for the data dictionary and the Data Governance team. I will also suggest what fields could be part of the daily snapshots. In addition, this presentation will show you the importance of having multiple resources available in one model, so users can filter the data based on their needs. Examples of dashboards will also show what areas could use some attention as missing data can affect reliable outputs, such as missing zip codes, etc.

Continuing the Journey: Building out Yale’s Common Reporting Platform for Data-driven Decision-making
Kathleen Broderick and Ryan Schlagheck, Yale University

This session will primarily focus on various aspects of Yale’s new data warehouse and reporting platform build out and implementation.  We’ll explore the decisions, trade-offs, strategy for community engagement and buy-in, reporting and data governance practices and lessons learned as we continue to drive significant change in a complex and distributed data and reporting environment.  The presentation will also cover our core principles for the Common Reporting Platform which differentiate it from a traditional data warehouse implementation:  • Both analytical and reporting abilities which individual source systems may not support (as reporting capabilities in source systems vary)  • Fills gaps in operational reporting due to lack of reporting capabilities (i.e., event-based scheduling, pixel-perfect formatting, collating, data visualization)  • Supports trending and longitudinal reporting and allow us to bring together multiple data families for analysis  • Provides data quality screening and feedback to source system owner  • Governs representation and use of data in an appropriate manner  • Scalable (data, platform and toolset) and flexible (source system agnostic)    The presentation will also cover processes, options and challenges encountered constructing the Common Reporting Platform and implementing under increasing user pressure.    Attendees will benefit from learning how Yale changed various processes and approaches from our legacy approaches to the technologies we use to how we engage with our various stakeholders.  They will also learn about Yale’s strategies for data load processing and driving adoption through change management to build a foundation for advanced analytics (machine learning, data science, leveraging AI) in the future.

Counting Faculty – How One College Made It Easy
Margaret Allen and Steve Papaccio, Bowdoin College

Clearly defining and modeling academic personnel data, which often are managed by Academic Deans or Provosts rather than Human Resources, is a common challenge. While Academic Deans structure their faculty data sets around budget lines and idiosyncratic faculty contracts, other reporting needs are left unmet due to inconsistencies with Human Resources data structures and lack of clear definitions.    Bowdoin College needed to migrate data from an annual spreadsheet provided by Academic Affairs into Workday; clarify definitions and requirements; create a quality-control routine; and develop a dimensional data model to store enhanced fields in support of strategic, longitudinal analyses. In this presentation, we will cover key examples and successes – from the governance process to the technical modeling – of the project.    We will reserve the last 15-20 minutes to discussion so that the audience has a chance to engage with the topic. If the room supports it, we would start with round-table discussions on what is the most important action needed at other institutions to improve faculty data and then have the tables report out the most relevant idea.    Through this session, the audience will gain a better understanding of the potential challenges with academic personnel data, ideas on how to collaborate with departments to resolve them and an example of how one institution modeled faculty data using Workday as a source.

Creating Better User Groups with ACM and Grouper
Julie Parmenter, Indiana University

Everyone creates user groups but few of us do it well. Those of us working in the Business Intelligence space create thousands of user groups for access and authorization. Groups are manually created and managed on numerous different platforms, including some with very complex user interfaces, with no standards, consistency nor audits. Many of these groups contain users who should no longer have access due to termination, retirement, job change or expired compliance credentials. This presents a significant security risk. The loss of productivity due to the number of people manually managing groups on multiple platforms in many different ways is also quite high.   Access Control Management (ACM) is software developed at Indiana University that allows for the creation and management of user groups and serves as the easy-to-use interface to the role-based management tool, Grouper. ACM has security features such as the automatic removal of terminated users, built-in compliance checking and alerting on users with job changes. ACM was originally developed as a method to automate security for Tableau workbooks. However it was quickly realized that the use of this tool could be expanded to be a global group management system. In this presentation, we will demo the software and highlight all of the types of user groups that we are building from local Tableau groups to Active Directory groups to authorization of cloud-based applications.

Data Cleansing using the Google Places API
Charles Rosenberg, PhD, University of Rochester

Students applying for admission to the University of Rochester are required to enter the schools they have attended previously along with their GPA and degrees received.  These data, along with city, state/province and country, are entered as free-text and so data quality is a major issue. The goal of this project was to try to find the best match to a list of approximately 50,000 schools sourced from the College Board. Using the Jaro-Winkler similarity score based on the school name, matches were found for only about 20% of the 110,000 schools entered and many of the matches were poor quality. An alternative approach was explored in which schools (school name, city, state/province, country) were looked up using the Google Places API. The resulting unique identifiers (place_ids) were compared to those for the College Board schools. The resulting performance was about 85%, which is probably near the theoretical limit given the quality of the data. Run time also scales only linearly versus exponentially using similarity matching.  This talk will present the architecture of the system and its implementation in Oracle Data Integrator.

“Data-Driven” Student Success for BI Teams Big and Small
Nate Rochester, Portland State University

“Data-driven” has become a modern mantra of higher education administrators at all levels, and is referenced often as an ideal to strive towards in matters of enrollment management, program design, and other organizational decision making. It has also become a standard expectation that student success initiatives should be “data-driven” in their development and implementation. This presentation explores some of the main challenges and obstacles to achieving this ideal for student success efforts, as well as strategies to overcome them.     I begin by briefly discussing the challenge of defining “data driven” in the context of student success initiatives. Rather than prescribing a single definition, I describe the flexibility of this term to be defined within the scope of a given initiative and the data and resources available. Data should not amount to mere anecdotes, but neither does it have to reach the level of predictive analytics. Suggestions and examples are provided.    I also discuss the challenge of separation (organizationally and in terms of expertise) between the IT/IR professionals who manage institutional data and develop reporting tools, and the administrators, academic professionals, and faculty members intended as their audience. Several key strategies are presented as ways to address this separation and related challenges in providing data resources for student success. These strategies fall into three broad categories: (1) Bridging the divide between academic units, advisors, and IT; (2) Building appropriate tools for particular audiences and applications; and (3) developing system integrations that allow reporting tools to engage with academic requirements.

Data Warehouses and Bitmap indexing
Joseph Hutchins, University of Utah

After a few minutes of introduction to the terminology, the meat of the presentation works around a metaphor of finding a needle in a haystack. But what if instead of a nice haystack and only 1 needle, it’s a pile full of all sorts of detritus (hay, coins, pens, sticks, some chunks of steel, and a few needles), and you’re trying to find a category of items.     The trick that data warehouses use is filtering the fact table in one go. If we want to filter our detritus-stack to only needles in one go, we could use a bloom filter.     A bloom filter in this metaphor would be a container with a magnet and a grate that only allows small items (hay, a few coins, and the needles) through, and only the magnetic items (needles, steel) would be attracted to the magnet. These filters apply at the same time so that you don’t need to send them through multiple steps.    The needle(s) are the rows you want, and the detritus-stack is the fact table. You could go through each piece of the pile individually (a full table scan), but that takes as many steps as there are pieces in the pile. By filtering the entire pile in one pass, you can find your needles much faster, and it takes roughly the same amount of time if the pile has 1,000 pieces or 1,000,000.    I would also be happy to be part of a panel discussing data warehouse performance.

Dirty Data is costing your organization time, money and incorrect results!!
Madhuban Chaudhary and Sridhar Ravikanti, Emory University

Dirty Data is costing your organization time, money and incorrect results!!    For a successful data warehouse project Data is key element and forms the foundation on which various business needs are designed and developed. It is utmost important to have the data as clean and as meaningful as possible.  At Emory University we have gone a long way from just getting the data from various sources and producing reports out of it.  Our presentation will list out various methodologies and tools we use to make sure the:  • Data quality is maintained when it is extracted from various sources.  • Date is meaningful and relevant to the business requirement and not redundant.  • Integrity is maintained with various sources and marts for producing cross functional reports.  • Data is properly modeled for producing analytical, visual and predictive reports.  • Data becomes the source of truth for enterprise wide reports  The presentation will include the ways by which we have automated the process of achieving highest data quality.  It will list out the challenges which we faced through the data journey and how we overcame those.  It will list out the how we have now bridged the gap of connecting the path between data warehouse and analytics.

Diving Into the Deep End: How We Built a Digital Education Dashboard Using Cloud Technologies and Agile Methodology
Roland Hall and Katerina Stepanova, Brown University

The project began in mid-May 2019 with the goal of creating a data lake using cloud technology. Our new CIO charged us to execute the project quickly, incorporating agile/SCRUM methodology, which was new to the project team. We were learning the agile methodology while simultaneously getting up to speed with new tools and data concepts.    The project team successfully developed dashboards that provide information on built-in tool usage in Canvas. We developed our data lake infrastructure on Google Cloud Platform, storing data in Google BigQuery and Google Cloud Storage. We used Talend and Python to ingest data. We built our dashboards in Tableau.    We learned to adopt a more modern and agile methodology for building curated data. The power of the cloud lets us deliver speedy performance even when using complex views that integrate data from disparate sources. We can quickly create usable prototypes and rework them as needed. These views can be used to build the data pipeline and dynamically create tables that can later be reverse engineered into a logical model.    We ran into challenges and learning opportunities every step of the way, from using Canvas data, to working in the cloud to developing effective visualizations. We will describe our process and talk about our lessons learned and where we are now.

Doing Regression When Your Dependent Variables Aren’t Well Behaved
Abby Kaplan, Salt Lake Community College

We will review the three types of regression described and apply each one to a sample dataset of LMS assignment submissions: we will use beta regression to predict grade (expressed as a percent), ordered logistic regression to predict letter grade, and multinomial logistic regression to predict how the student submitted the assignment (text entry, file upload, URL, etc.).  The sample dataset and the R code for each regression will be available for participants to download.

EDW Operational Excellence
Gregory Locke, Yale University

This talk will cover the following topics:    Data Vault Architecture: Fast, Flexible, Adaptable, and Scalable data warehouse platform supports agile methodology.    Business Layer Stabilization – How to support 24/7 data access for data services and reporting     Process Orchestration and Monitoring – Automated re-start and recovery, Load performance trending and SLA monitoring

ETL for Power Users?  Yes, and we will show you how
Kristin Kennedy, Arizona State University

In the history of Analytics and Business Intelligence, the process of Extracting, Transforming and Loading (ETL) has been owned by centralized IT or Institutional Research or some rogue department.  As the world of analytics continues to evolve it is becoming harder and harder for these groups to keep up with the demand.  Between an abundance of new data sources from disparate systems, a departure from traditional databases for applications as well as an increased demand, it is becoming impossible to meet the demands of customers.  At Arizona State University, we have a large group of sophisticated power users who have been meeting the needs of the business in creative and agile ways for many years.  We decided to help them do what they have been doing, better and easier.  As a result, we brought in a series of tools such as Alteryx and Aurora PostgreSQL to come up with a way that these subject matter experts can be our partner in getting data out to the business in an enterprise, sustainable way.  This session will go over how we did this, as well as sharing all of the victories and challenges of its implementation.  We will share with you our architecture as well as some ideas of costs and skills needed, from both IT and the power users.

From Big Data to Free Computer Rentals: Transforming Data to Enable Student Success
Corinne Briggs and Russell Youngberg, Brigham Young University

In order to have more proactive interventions leading to student success, we needed student-centric data. Learn how Brigham Young University took application-centric data from multiple sources, transformed it into student-centric tables to be consumed by machine learning modeling, and then transformed the input and output data into a format consumable by end users.  We are still working through challenges of combining multiple data sources, but we have built a reliable and resilient Early Alert system to allow proactive solutions to individual student challenges in just a year.  The data structure we created is also scalable to be useful to other projects now moving forward and to allow additions of new data sources.

From Statistical Significance to Practical Application: Communicating Predictive Results between Analysts and Stakeholders
Heather Chapman, Weber State University

As predictive analytics and machine learning become more and more popular across higher education, the importance of translating findings from these complicated analyses is critical. Typically, those requesting the analyses and those conducting them come from two different worlds, and have very different expertise. Stakeholders often have a very good understanding of the messy issues that exist, but often have a hard time translating that into questions for a data scientist to answer. On the other hand, data scientists may have the technical expertise to run the analyses but often lack the real-world application experience to translate findings into a language the stakeholder can understand. The full power of predictive analytics results cannot be reached without addressing this mismatch in expectations and expertise. This presentation provides practical tips for both stakeholders and analysts to bridge the gap when reporting results.     Specifically, this presentation will attempt to provide answers to important questions such as: What makes a good research question? What are some of the most common issues associated with variables used in higher education research? What are the critical components that should be provided when reporting on predictive analytics?      As part of the session, participants will receive a step-by-step guide for both requesting analyses as well as providing results.

Growing analytics capabilities at a small liberal arts college for a data informed future
Dobby Spencer, Meghal Parikh, and Bay Rodriguez, Rollins College

Traditionally, institutional data in higher ed were derived from a small number of guarded sources. This data was ETLed into data-marts or relational databases for reporting. Data stewards could refine the data upfront supplying limited data elements for analytics and data governance emphasized on data definition management mostly on reporting side. Today, institutional data is distributed across multiple systems, is unstructured and alignment with Enterprise Data Warehouse (EDW) is complex. When data is sourced from multiple sources and is continually refined at every stage to analytics, governance becomes a business process driven exercise entailing what we call as data lifecycle management rather than data definition management. Data stewardship involves managing the usability data and increasing the value of the data domain. Data monitoring responsibilities shift from stewards to technology managers who become the caretakers of data quality.    In 2017, data analytics became a strategic priority to inform complex, timely decision making at a small college. The strategic goals included modernizing the data technology services, deploying an integrated data warehouse, developing a self-service BI architecture and improving data management. Execution of these goals hinges on a modernized, distributed data governance framework with re-imagined business processes, support for multiple systems and new institutional reporting standards; all disruptive to the existing conventional data culture at the institution. This presentation will share the progress towards a modern vision for institutional analytics leveraging external consultants, a revitalized data technology portfolio, and strategic hiring of dedicated data professionals.

Helping Departments Begin Using a Data Dictionary: Make it useful and the process automated
Susan Schaefer and Joseph S. Hutchins, University of Utah

This presentation will go the philosophy, “just in time” documentation, that helped us get departments on board as well as the technical aspects we used to create an automated data dictionary process.  Technology currently utilized for this process:  Data cookbook – data dictionary  Stream Sets – API/ETL tool  Oracle Database – Oracle Dictionary Views (Connects definitions to database fields)  Tableau – Current reporting tool (utilizes SQL/python script to review tableau sources and bring in definitions connected to those sources)  The idea behind our process can be applied to other tools, but the code that we will be sharing has been written specifically to work with the above tools. Beginners will gain an understanding of our process history and current implementation with simpler code as well as arguments and discussions we had with departments to better implement this change. Joseph will go over some of the more technical aspects of the automation process at a high level but will have the code available for intermediate level attendees.  (For the automation, we use python code in conjunction with some SQL code and functions.) We end with our perspective of how to bring definitions into dashboards and reports.

How do we maintain applicability in a new analytics environment and capitalize on citizen data scientists across the university to meet advanced analytics needs?
Ashley Hallock and Dimuthu Tilakaratne, University of Illinois

The University of Illinois System is actively working toward meeting our changing needs in data and analytics. Through a use case gathering mission, consultations with other Big Ten Universities, previous HEDW presentations, and discussions with Gartner Analytics we prepared a recommendation and foundation for a System-wide Advanced Analytics service. We would like to share our plan forward and discuss our successes and opportunities.      The first of our efforts is to allocate resources to maintain the existing Data Warehouse and BI/Data Visualization solutions, while modernizing our tools, storage and processes. Success in this effort includes re-engaging our current community of reporters and analysts who have built extensive reporting and visualization solutions using our platform.  The second effort is to strategically delve into advanced analytics using an agile framework created by our Innovation team. This is done with virtual teams created around projects to answer the next generation of from our use case list. We are engaging our end users in an effort to create an environment of Community Data Scientists who can discuss everything from university data to software to the next big model.

Integrating Python into Analytics
Ryan Fellers and Ashtin Preston, University of North Texas

Our office, which lingers between a Division of Finance and Division of Enrollment Management, has come to use Python in multiple projects.  While not necessarily a traditional database, we have found it to be very useful for integrating different data sources (text data, time-series, categorical, etc.) and easily transforming it for data analysis and predictive analytics.  We’d like to show the audience how simple it can be to use in the traditional BI sense, while briefly mentioning it’s use in machine learning.  Python is easy to learn (barely more complex than SQL), and we planned on using Google Colab to share our code live which can be run interactively during the session for the audience to follow along.  None of this is required to participate, but Python (especially in the scope of Google Colab) is free and easy to use.  We plan on sharing live how we have used it at UNT.  Our hope is that the audience walks away with keys to a new skill set while referencing resources for them to learn more if they desire.

Interactive Dashboards and Reports
Jason Jensen, University of California – Irvine

Overall, we needed to come up with a suite of reports / dashboards that were dynamic, easy to use and conveyed the most useful information within the amount of space that a webpage can afford.  Also, we did not want to overwhelm the users with 50 to a 100+ separate reports to deliberate over.  The goal was to provide a few key areas of focus (graduation, retention, enrollment and major migration) and provide a dynamic interface for each key area, an interface that would allow the user to explore the data within the particular metric confines and provide potent visual representations alongside precise, tabular data.  A big component of this design was to engineer the reports with the ability to collapse, expand and position student characteristics on the fly, along with presenting the data in either student counts or as a comparable distributions/percentages.  A secondary objective was to have each area of reports function similarly, that way a user familiar with say graduation and retention reports would be able to operate the enrollment reports in basically the same manner, thereby reducing the learning curve and maintaining a high level of user acceptance.  The general structure of this presentation will cover the three basic reporting metrics and how the corresponding dashboards operate.  Sample runs of the reports will be displayed and discussed.  Key scenarios will be chosen to demonstrate how the reporting system can and does lead to decisions to better the public/student good.

Let’s explore IR’s function as “sensemakers” in the era of machine learning!
Yuko Mulugetta and Andrew Siefert, Ithaca College

Last summer, an EDUCAUSE-NACUBO-AIR joint statement called for urgent actions to better leverage analytics to improve higher education.  IR’s role as “sensemakers” should be the center analytics actions, including asking mission-critical questions; developing and interpreting analysis correctly; revealing data insights; and telling data-informed stories.  Ithaca College has been exploring the function as “sensemakers” by developing a cloud-based, just-in-time student retention analytical framework.  Effecting proactive measures to improve retention requires not only the predictive modeling of retention probabilities of individual students, but also that those predictions be prescriptive and produced early enough to change their behaviors.  But Ithaca has realized that the more we rely on “black box” machine learning to improve the prediction accuracy, the more challenging it becomes to articulate a prescription for changing student behaviors.   This session will first present the development of the retention model, including the tools, techniques, and thought processes.  This will be followed by a discussion on moving this model into the cloud including an explanation of the technology implementation that automates daily evaluation and predictions.  Lastly, we will interactively discuss how to optimize the interpretability and prescriptive power of retention analytics as we increasingly rely on “black box” machine learning algorithms for more accurate predictions. We propose the SHAP (Shapley-Additive-exPlanations) approach to measuring the contribution of each variable to the overall prediction for each student and displaying the top negative and positive indicators for each student based on SHAP values.  The session will explore IR’s role as “sensemakers” in the new era.

Let’s talk data! Voice-based Apps and Augmented Analytics
Amber MacKenzie, University of Michigan

This presentation will showcase the work that University of Michigan has been doing to bring augmented analytics and voice-based applications to our students, faculty, and leadership. There will be live demos of the tools, discussions on how we incorporate voice technology within our suite of Business Intelligence offerings, and what it took to reach this point. What if you could update your timesheet, or ask about your vacation balance by talking to your phone.  What if you could instantly join data from social media, text books, spreadsheets, and enterprise databases to find correlations you didn’t realize even existed. What if you could predict and react to enrollment trends before they happen. Let’s discuss what we are doing today, what is possible tomorrow!

Modularizing Data with dbt
Jacob Mastel, Oregon State University 

My plan is to give a high level overview of our new data pipeline in AWS and Snowflake, and how dbt fits within that pipeline. Then, I plan to go over the principles of what dbt is. Once the audience has an understanding of what dbt is and its general features, I’ll go through a real world example that we are actively using including small code samples.

Practical Steps in Building a Data Culture – One Conversation at a Time
Karen Menard, Ann Cesar, and Rob Heenan, University of Guleph, Canada

The University of Guelph has been developing and implementing an institutional data strategy for the past 2 years.    Driven by a need for improved access to information for decision-making, we started out our journey evaluating our current state, seeking valuable input from other institutions and experts,  engaging with key stakeholders across our campus, and hearing about success and failure stories, to outline a data strategy plan that we believe will be successful.  As much as we would like to see progress come more quickly, as we deal with the pressures of the ever-changing higher education environment in Ontario, Canada, we are making progress and we see and hear this through hallway conversations and meetings every day.   They say ‘change happens one conversation at a time’ and we believe our data strategy is already changing the data and information culture.  If you are just getting started on your journey to drive this transformational change in your institution, we will aim to help simplify your process, by ensuring your are engaging the right people, having the right discussions, and providing some ideas around organizational structure and implementation that we find are the right steps to enable the success of the program.     We will share our approach and journey in an interactive format to allow audience members to get the most out of our learnings.

Python for Data Science: How to leverage your EDW
Chris Frederick, University of Notre Dame

1) Introduction to Python for Data Science  2) Introduction to Jupyter Notebooks  3) Demonstration of how to connect to an EDW and execute a SQL query.  4) Demonstration of how to perform Exploratory Data Analysis with Python.  5) Demonstration of how perform a variety of modeling techniques (Possibilities include linear regression, neural nets, etc).  I’ll figure this out as I put together the presentation to ensure that its both a powerful and  accessible demonstration.

Rethinking Data Governance: A Journey of Nudging towards Data Governance in Higher Ed
Maya Collum, Kelly Raposo, and Christine Beckermann, University of Toronto, Canada

To some degree, we are all struggling with the same data governance questions in HE: Who should have access to what, and in what way? What are the official institutional definitions and terms of use for data? How do we ensure compliance in a large, decentralized, multi-campus institution? Join the presenters on our journey from siloed information structures to meaningful, collaborative data governance. We will share how we did it quickly and without copious resourcing. Spoiler alert: it wasn’t complicated, and agile was involved!   The lessons we have learned along the way provide a blueprint for institutions to rethink data governance – building something simple but effective – and, at the same time, building an invested, cross-cutting community of data users. We’ll share effective strategies for operationalizing a consultative process that reaches experts, users, and decision-makers across the institution.

Show me the data! (…that all this was worth it)
Steve Sporinsky, University of Notre Dame

A few years ago, Notre Dame dove headfirst into the vision of that trendy phrase: “…a culture of data driven decision making”.  We stood up a robust EDW and reporting/analytics environment (in our case, Tableau).   We branded ‘dataND’ (still growing today).  We laid the seeds for campus data workers to shift to a new way of working (and some are).    We instituted a data governance program and a complete data competency support model (whoops! scratch that word ‘complete’).   We even implemented a security model that limits the sensitive stuff to the right people.  We ‘freed the data’.   Good, talented people have done some amazing things.      And then it happened: the boss asked us to show the value of our services.   Challenged with how to show the impact of our service offerings we needed to decide what measures matter, do we have the right data (including the less tangible ’cultural’ things), and how to turn all that into a value story.     How do you measure a data governance program’s adoption?  …Tableau’s value?   …your efforts to drive a stake through the heart of outdated, stale sources-of-truth spreadsheets in  literally thousands of Google Drives?    We hope ND’s examples will promote an exchange of even more ideas – to help us all become better.in our quest for practical metrics for the adoption of data analytics and reporting service offerings.   After all, we all have a job to do, don’t we? …drive our institutions toward a culture of data driven decision making.

The Dream House that Data Governance Built
Barry Goldstein, Washington University in St. Louis

Data Governance can take on a lot different paths, options and capabilities.  Similar to when you are building a house the options make most people’s heads hurt.  You want everything – but can only do so much.  In this presentation we will take a look at the major steps needed to build your house and how those parallel the Data Governance path.  Along the way we discuss helpful tools and tips.  We will cover People, Process and Tools involved through DG construction.    Topics will include    • Funding / Resources – Executive investment / commitment / approval?  • Vision / End State – What is success?  What are the milestones?     • Find a Building Site – Where will we build?  Does it matter?  • Architecture / Blueprints – Your plan and timeline?  • Prepare construction site and pour foundation – Your Stewardship Structure and foundational pieces.  Your DG Framework and Stewardship Structure.  • Construct rough framing – The Policy / Standards / Guidelines and guardrails.  • Complete rough plumbing, electrical and HVAC – The processes – how will still stuff work.  How data will flow through its lifecycle….  How is data created, stored, used, disposed of….  • Finish Interior and Exterior – What Tools do we need?  Can we customize and adjust?  • Walkthrough of finished House – How will we maintain and sustain DG?  • Host an Open House – Show it off / show the value you add.  Communication and marketing  • Enhancements / Future Projects – What are the next phases?

The Report Prioritization Group: The Key to a Successful BI Team
Matt Portwood and Alexis Wall, University of Washington

In this presentation, the UW-IT Business Intelligence Team will provide the history and background for our Report Prioritization Group at the University of Washington (Seattle). This presentation will feature two presenters. Matt Portwood (Business Intelligence Team Manger) will provide context for the UW BI landscape and will examine the highlights and lowlights of the RPG’s history. This part of the discussion will focus on how to get and maintain buy-in from campus leadership. Alexis Wall (Senior Business Intelligence Engineer) will explain what the RPG does in and out of meetings, and how we deliver and communicate about our work. This talk will have an emphasis on implementation, and we want to encourage other BI Team’s to borrow the RPG model and implement at their institutions. In sum, this presentation will demonstrate how we leverage this group of RPG stakeholders to coordinate change requests, collect feedback, confirm definitions, get buy-in, deliver updates, and communicate changes to our large and growing community of users.

Three Pillars of Data Wisdom: Davidson’s Approach to Building Self-Service BI
Pete Benbow, Davidson College

In the last two years, Davidson College has made enormous strides in securing the Holy Grail of self-service BI across campus. Starting from a place of numerous siloed data sources and a dizzying array of integrations run via cron jobs, Davidson has fashioned a long-term BI and integrations strategy built on three pillars:    • SnapLogic, an integration platform-as-a-service (iPaaS) that allows central IT and non-IT developers to build pipelines that not only move data from point to point, but that also can render REST API endpoints.  • Power BI, our primary reporting and analytics platform, which puts the power of complex data modeling, rich visualization, and AI within reach of all users on campus.  • The Enterprise Data Warehouse (EDW) and data lake, our centralized repositories that provide both highly-structured and highly-governed data alongside raw and unstructured assets.    In this presentation, we’ll describe the evolution of Davidson’s BI capabilities from past to present, the reasons Davidson chose these platforms to support our strategy, the challenges we conquered and those we deferred, and how each pillar contributes to the goal of building a campus-wide culture of “data wisdom.”

To know is to learn what you have
Lance Tucker, Boston College

Boston College, like many schools, has hundreds of applications and databases used to manage information. Less sophisticated electronic data stores (e.g. Google Sheets) may also serve as systems of record for some business processes. Preferred solutions are often chosen and managed by multiple groups or departments within the University.  In 2017, we began cataloguing our information sources.  To date, we have developed metadata for over 600 items across nine vice presidential areas.  An application was developed to complement this collection that both houses and manages the records, as well as allowing a user to visualize or report.    This presentation will cover how we identified the metadata, the process used to collect the information, and how information can be utilized.  While it has taken considerable time to research and collect the data, managing the information is fairly easy and provides a good tool for IT managers to collaborate with their customers.  The presentation also covers how we accomplished this effort with minimal staff resources and cost.  We are now in the process of demonstrating the value this information can provide.    Like many of you, when we began our information governance program, we dove right into terms and data dictionaries.  This approach proved to be a sizable challenge. It was difficult to develop the business process and garner participation to support and sustain the effort.  We learned that starting on higher ground with an information catalog is a better approach to indoctrinate staff on governance practices and the value of curated management.

Upgrade or Trade-in: Navigating your data warehouse through a major change
Ravindra Harve, Boston College

For the past several years, Boston College has been converting a mainframe-based student system that serviced the University for nearly forty years. The new system is a service-oriented architecture with a code foundation supplied by the Kuali Consortium. The approach taken has been to implement this conversion iteratively module by module, student accounts, financial aid, course catalog, enrollments, and other subject areas. Implementing the new transactional system has been a challenging task for the University as the data dependencies between the student system and other ERP and departmental applications are considerable, as well as dependencies within the student system itself.    Data Warehouse developers challenged how to keep historical reporting relevant when the new source system is very different. The analysis involved going through thousand’s of tables, new column names, new business logic. This presentation will explore the process used for developing a strategy to accommodate the significant change. We will present the approach the EDW team used to Discuss with stakeholders, negotiations with application developers, and how to build timelines and propose solutions.  Besides, how to use information governance tools to assist with developing solutions and help navigate through technical details requiring the adoption of a new environment. This effort is still a work in progress, but we will share what has worked to date and what has been problematic. After trial and error, a discussion around approaches used to open developer communication channels, working with multiple project managers, and vying for resources and funding are to be covered by this presentation.

Usage Tracking, or, I Always Feel Like Somebody’s Watching Me
Greg Brown and Theresa Sherwood, Bowling Green State University

BGSU experienced performance problems on both their analytical reporting platform (OBIEE) and transactional reporting platform (SAP Business Objects/Crystal Reports). In order to efficiently troubleshoot issues, we decided to enable usage tracking in both systems and create dashboards in Power BI to quickly find data related to the issues. Information such as usernames, report names, and date and time of event are captured and reported on. Come hear our story and view examples of our usage tracking dashboard content.

Using Data Virtualization to break down data silos and solve other data problems
Richard Hanks and Roger Tervort, Brigham Young University

When I arrived at BYU, the decentralized storage and use of data was a challenge.  There were many data silos, a lot of replicated data, challenges in securing and delivering data outside of applications, multiple database platforms, the need for customers to use and update their own data in reporting environments.  By introducing data virtualization, we have made in-roads in solving these problems.  Some of the data silos now use DV to access their own data and when they have approve Data Sharing Agreements, the requested data is simply given to them by changing access permissions.  This has saved many time consuming ETL processes, additional replication of data, and now provides a more central point for campus to access their own and enterprise data.  My team was spending a lot of their ETL development time copying tables from one database platform to another.  DV has eliminated a lot of the light ETL work and data replication and has allowed users to access data at its source.  Our security team receives JSON and csv files from our sister campuses and through DV, both of these types of data files can be used as tables for SQL queries and enhance their architecture.  Campus departments with specific faculty roll-ups can load their own look up files or roll-ups into DV and can use them with other reporting data.  The Library has anonymized some of their patron data and through DV has made it accessible to other areas of campus for analysis.  The library uses the faculty data to fine tune the databases they subscribe to in order to make sure they include publications and databases where the faculty publish.  These are just a few of the examples of where data virtualization can assist distributed environments and customers get the data that they need.

We Don’t Have Suites: Tackling master data in an institution of competing business priorities
Cynthia Carlton, University of Rochester

In 2017, when several enterprise projects surfaced the need to have consistent data for leased and owned buildings, the University of Rochester sailed headfirst into uncharted territory of master data. We brought together a working group made up of individuals representing data domains and technology systems across the institution. We gathered input from anyone that would talk to us. We benchmarked industry standards. We figured out what we needed, who we needed, and recommended how we could accomplish the goal.  “This won’t be that difficult!”, we naively declared to ourselves.    This presentation will tell the story of our master data journey with location data. From the initial working group, to the implementation, and the inevitable course corrections. We’ll share the places we’ve been successful, what we learned from the challenges that we encountered and where we’re headed next. While this voyage hasn’t been smooth sailing, it’s certainly an adventure you don’t want to miss.

What a long strange trip it’s been: Tales of a First Chief Data Officer
San Cannon, University of Rochester

Data governance is an often discussed but not often understood practice.  While common to many industries, most of the applications and guidance available for starting and sustaining data governance programs is better suited to profit driven corporations rather than mission driven universities. This presentation will cover the drivers that lead to the hiring of a chief data officer as well and the progress and challenges faced when setting up a more formal data governance program within a higher education culture.

What Gets Measured Gets Managed: How Good Data Practices Support Diversity Goals
Meghan Hall, Brown University

The majority of colleges and universities are rightly addressing the diversity of their personnel—faculty, staff, and students—and Brown is no exception: as part of the most recent strategic plan, the university announced in 2016 an ambitious goal to double the number of tenure-track faculty from underrepresented groups. A goal like this, which conference attendees from various schools should be able to relate to, requires tremendous resources, and as funds are being allocated to these diversity goals, attention needs to be paid to the data practices that are being used to measure them. Such important, high-profile goals deserve and require robust data management procedures, and this presentation will discuss lessons learned from the beginning stages of establishing the goal through the processes of tracking and maintaining progress. This presentation will also highlight Brown’s experiences with data governance, including the creation of data definitions, the establishment of a data-informed culture, and the importance of institutional buy-in. For example, we quickly found a gap in our data management processes, in that we were not adequately tracking the demographics of applicants for faculty positions, and due to senior leadership support and preexisting data management practices, we were able to deploy solutions quickly.

When it comes to BI Tools, Is More Better?  A panel discussion on the pros and cons of using multiple BI tools within an institution
Dave Pecora, Rochester Institute of Technology

We will begin the panel discussion with a brief introduction of the topic (described above), followed by an introduction of each participant and their respective university.      Some of the questions we will explore in this panel discussion include:     1) Does your school use multiple BI tools?  What are the advantages and disadvantages of having multiple BI environments?    2) Are other schools ditching big-box BI vendors for tools like Tableau?  What are the driving factors for this?  What are the potential pitfalls?    3) Multiple environments are more costly to maintain.  When is this worth the cost?  When is it not?     4) Do multiple environments make BI and self-service analytics easier to adopt by allowing choices, or more difficult to adopt by making the environment more complex?    What other approaches are being used?  Reporting portals, other hybrid approaches, etc.    Ample time will be allowed to take questions from the audience, and to allow schools in attendance to participate in the discussion.

Where’s your Dev at? – Managing multiple environment architecture development and maintenance
Matthew Fields, University of North Texas

The department of Data, Analytics, and Institutional Research (DAIR) at the University of North Texas (UNT) manages two legacy data warehouse environments with a third environment in ongoing development. Coinciding with the ongoing development is a full environment upgrade project that brings the hardware, overall system configuration, and software products (back and front end) up to more current features sets offered by the vendor.  The Data Modeling and Integration team is faced with challenges addressing the various needs of these environments. As the manager of the team, I’m constantly torn between reviewing the performance and nightly execution of both legacy systems, planning and meeting for multi-departmental discussions on the details of the project upgrade, and continuing to direct and participate in developing elements of the new warehouse environment. I have the benefit of strong project management and technical support from our IT partners, well versed and capable consultants from our vendor, and talented employees. Although they all provide beneficial assistance, there are inevitable issues that arise that need consideration of the impact of legacy systems versus ongoing development.  This presentation intends to examine the management of factors that drive decision-making for development in ongoing, mission-critical environments. It seeks to view changes in light of time management, severity of institutional needs, and impact on ongoing partnerships with colleagues within and outside the department. Successes and failures will be discussed in this presentation.