January 5 – January 22, 2021

The Research Computing Services group welcomes you to attend their first Research Computing Boot Camp!

The Research Computing Boot Camp gives you the opportunity to focus in more depth on several programming topics that are very useful for high performance computing including Machine Learning. A single data set will be followed throughout the progression so that attendees can learn more about the tools and methods for handling data. The Boot Camp begins with three programming tracks (Python, R, MATLAB) each following the same 4-part progression. These prepare all attendees to attend the ArcGIS Online 2-part session in which the same data set will be further examined and mapped.

In addition, MathWorks (MATLAB), Wolfram (Mathematica), and NVIDIA advanced technical staff instructors will be offering exciting highly useful sessions.

Registration notes:

  1. All sessions will be held over Zoom and registration is required to receive a Zoom link.
  2. Register once for a 4-part programming track. The schedule was designed so that you may attend multiple tracks if you wish.
  3. ArcGIS Online requires an account and we are creating accounts for those who need one. Register for either the ArcGIS “have account” or “need account” as appropriate. Note that we need one week lead time to create accounts so register early!
  4. Register for each vendor session you would like to attend.

Software installation notes:

This Boot Camp precedes our usual RCS Spring Tutorials, which will start on January 25; the schedule will be announced in early January.

Boot Camp Schedule

The R, Python, and MATLAB programming tracks all follow the same 4-part progression and their common track descriptions are provided below the language descriptions.

Session Name Date Time Registration Deadline
R (Hands-on) Tuesday, Jan. 5
Thursday, Jan. 7
Tuesday, Jan. 12
Thursday, Jan. 14
10 am – 12 pm
10 am – 12 pm
10 am – 12 pm
10 am – 12 pm
Noon, January 3
MATLAB (Hands-on) Tuesday, Jan. 5
Thursday, Jan. 7
Tuesday, Jan. 12
Thursday, Jan. 14
1 pm – 3 pm
1 pm – 3 pm
1 pm – 3 pm
1 pm – 3 pm
Noon, January 3
Python (Hands-on) Wednesday, Jan. 6
Friday, Jan. 8
Monday, Jan. 11
Wednesday, Jan. 13
1 pm – 3 pm
1 pm – 3 pm
1 pm – 3 pm
1 pm – 3 pm
Noon, January 4
ArcGIS Online (Hands-on) Tuesday, Jan. 19
Thursday, Jan. 21
10 am – 12 pm
10 am – 12 pm
Noon, January 12
MATLAB w/ Python (Lecture) Friday, Jan. 15 10 am – 12 pm Noon, January 13
MATLAB w/ ML (Hands-on) Tuesday, Jan. 19 1 pm – 3 pm Noon, January 17
MATLAB C&C (Lecture)
Wednesday, Jan. 20
1 pm – 3 pm Noon, January 18
Mathematica Programming (Hands-on) Thursday, Jan. 21 1 pm – 3 pm Noon, January 19
Mathematica and Wolfram Language (Lecture) Friday, Jan. 22 1 pm – 3 pm Noon, January 20
NVIDIA: GPU accelerated data science using RAPIDS (Hands-on) Friday, Jan 15 1 pm – 5 pm Noon, January 14

Boot Camp Track Topics

R Language Track (Hands-on)

Instructor: Katia Bulekova (ktrn@bu.edu), BU Research Computing Services

R is a free and open-source programming language with a primary focus on statistical computing and data analysis. As such, it has a large suite of tools that are well suited for statistics and data science. This can be further expanded via third-party packages available via package repositories such as CRAN.

MATLAB Language Track (Hands-on)

Instructor: Josh Bevan (jbevan@bu.edu), BU Research Computing Services

MATLAB is a commercial programming language from MathWorks. Originally developed for linear algebra and numerical focused computing, it now has wide applicability and functionality for many domains, including statistics and data science. It is possible to write comparatively fast and succinct programs in MATLAB thanks to a large built-in library of functions which is further expanded by many proprietary and third-party toolboxes.

Python Language Track (Hands-on)

Instructor: Brian Gregor (bgregor@bu.edu), BU Research Computing Services

Python is a free and open-source programming language with a focus on ease-of-use and readability. Python has a comprehensive standard library, further expanded by a huge collection of third-party packages that can be easily used via package repositories such as PyPI. These packages add a wealth of functionality including notable packages for handling numerical and scientific tasks, statistics, data science, etc.

R, MATLAB, and Python Track Descriptions

Each of these tracks is a 4-part progression and you will register once for all four of them as a unit. We strongly encourage you to register for the 2-part ArcGIS Online sequence which follows and continues with the same dataset used in your programming track.

Part One – Programming and Data: Language Basics and Example Dataset

Analyzing a large or complex set of data requires computational techniques implemented in a programming language. Part One introduces programming language basics and useful features. Concepts like data types and structures, syntax, loops, and control flow are introduced. How to use the Integrated Development Environment (IDE) is also demonstrated. An example dataset is introduced that will be used throughout all parts.

Part Two – Data Processing and Handling

To work with a dataset various operations are needed to bring it to a usable state within a program. Part Two walks through this process, covering reading in data from files; cleaning the data; formatting the data; and manipulating the data in memory. Various data structures will be discussed alongside their trade-offs.

Part Three – Data Visualization: Plotting and Graphics

Examining complex patterns and correlations in data requires visualization to present the results in a digestible form. Part Three presents various ways to plot and visualize data. Different plotting techniques and formats are presented, along with various ways to format and configure their display and appearance.

Part Four – Using Statistical Tools for Analyzing Data

Discovering and verifying meaningful patterns and correlations in data requires quantitative techniques that can analyze the data in a rigorous mathematical or algorithmic way. Part Four presents how to use various basic statistical tools to analyze the example dataset. The use of these tools is demonstrated and various example diagnostics, patterns, and correlations are examined in the example dataset.

Continue with ArcGIS Online (Hands-on)

Instructor: Dennis Milechin (milechin@bu.edu), BU Research Computing Services

In this 2-part sequence that builds on all three of the previous programming tracks (R, MATLAB, Python), we will explore how to create Web Maps using ArcGIS Online. ArcGIS Online is a cloud-based service provided by ESRI, which allows one to create web maps and share them with the world.

No prior experience in GIS is required and no specific programming language experience is required.

ArcGIS Online requires an account and we are creating accounts for those who need one. Register for either the ArcGIS Online “have account” or “need account” as appropriate. Note that we need one week lead time to create accounts so register early!

Part One – What is ArcGIS Online?

  1. Overview of the general workflow in ArcGIS Online
  2. Explore the ArcGIS Online web interface.
  3. Learn how to upload County Health Ranking Data
  4. Find spatial data published by others.
  5. Create a web map by (importing layers, symbology, etc.).

Part Two – Using the web maps from Part One to create the following:

  1. A dashboard application to allow web map visitors to explore the data.
  2. A story map, to create a narrative of results or findings.

Vendor Presentations and Workshops

Using MATLAB with Python (Lecture)

Instructor: Sean de Wolski, Application Engineer, The MathWorks, Inc.

Engineers who rely only on Python may find themselves encountering difficult or challenging tasks when it comes to embedded applications, building interactive dashboards, parallelizing applications, and deep learning. Contrarily, MATLAB is a full-stack advanced analytics platform that empowers domain experts to rapidly prototype ideas, validate models, and push applications into production with ease. However, sometimes it is advantageous to integrate MATLAB and Python together. One example being the need to combine MATLAB’s vast library of advanced analytics capabilities with supplemental models available in the open source community. Another, using Python as a language that is well suited to pipe data between different IT systems or the web.

There are several ways to integrate MATLAB and Python together either as R&D tools or as scalable components of your production infrastructure. The latter giving business users and decision makers immediate access to many of MATLAB’s built in analytics capabilities from deep learning, optimization, signal and image processing, computer vision, data mining, time-series forecasting, embedded code generation, and more.

In this session we demonstrate the many ways in which MATLAB and Python can interface and integrate with each other.

Highlights include:

  • Calling Python libraries directly from MATLAB
  • Calling Python from within a Simulink Model

MATLAB – Hands-On Virtual Lab: Machine Learning (Hands-on)

Instructor: Elvira Osuna-Highley, Senior Customer Success Engineer, The MathWorks, Inc.

Machine learning is a data analytics technique that teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model.

Using MATLAB, engineers and other domain experts have deployed thousands of applications for predictive maintenance, sensor analytics, finance, and communication electronics.

In this hands-on workshop, you will use MATLAB to:

  • Learn the fundamentals of machine learning and understand terms like “supervised learning”, “feature extraction”, and “hyperparameter tuning”
  • Build and evaluate machine learning models for classification and regression
  • Perform automatic hyperparameter tuning and feature selection to optimize model performance
  • Apply signal processing and feature extraction techniques

Please make sure you have MATLAB R2020a or R2020b to ensure optimal experience.

Scaling Up MATLAB Applications to Clusters and Clouds (Lecture)

Instructor: Div Tiwari, Customer Success Engineer, The MathWorks, Inc.

Large-scale simulations and data processing tasks that support engineering and scientific activities such as mathematical modeling, algorithm development, and testing can take an unreasonably long time to complete or require a lot of computer memory. You can speed up these tasks by taking advantage of high-performance computing resources, such as multicore computers, GPUs, computer clusters, and cloud computing services.

Using the Parallel Computing capabilities in MATLAB allows you to take advantage of additional hardware resources that may be available either locally on your desktop or on clusters and clouds. By using more hardware, you can reduce the cycle time for your workflow and solve computationally and data-intensive problems faster.

We will discuss and demonstrate how to perform parallel and distributed computing in MATLAB with minimal changes to your code. We will introduce you to parallel processing constructs such as parallel for-loops, batch processing, and distributed arrays. Discover how you can easily scale your MATLAB applications and leverage cluster and cloud resources from providers such as AWS and Azure.

Highlights Include:

  • Built-in support for parallel computing
  • Creating parallel applications to speed up independent tasks
  • Scaling up to computer clusters, grid environments or clouds
  • Employing GPUs to speed up your computations
  • Programming with tall and distributed arrays to work with large data sets

Mathematica Programming Workshop (Hands-on)

Instructor: Kelvin Mischo, Wolfram Research, Inc

Kelvin Mischo is the coauthor of Hands-on Start to Wolfram Mathematica and Programming with the Wolfram Language.

This 2-hour introductory-level training session is suitable for students, faculty, and staff just starting out with Mathematica, or for those who could use a refresher. The instructor will start with a blank Wolfram Notebook and build up calculations with attendees typing along. Version 12.2 of Mathematica was just released, and the workshop will also highlight new aspects of Mathematica 12.2 such as  entering TeX in Wolfram Notebooks, new AI and deep learning capabilities, and unique data science calculations and visualizations.

The content will be an overview of the general conventions used throughout the language in any discipline. Examples will highlight solving problems in mathematics (symbolic calculations, solving integrals, solving problems in linear algebra), creating mouse-driven models for animations or simulations, and data analysis. Common ways to share finished projects will also be shown, including sharing Wolfram Notebooks with anyone (including recipients that do not use Mathematica) and building web forms with embedded computation.

Each attendee will need either a local installation of Mathematica or Mathematica Online.

Data Science Workshop in Wolfram Language & Mathematica (Lecture)

Instructor: Kelvin Mischo, Wolfram Research, Inc

Wolfram Language, which is included in Mathematica, has extensive capabilities in data science. This workshop will show how to import external data, use Wolfram’s curated data sets, and scrape web sites. Additional examples will show how to perform statistical analysis, create unique graphical representations of data, find unique patterns in data and filter data, perform text processing, as well as work with machine learning and neural networks.

This high level language automates many aspects of calculations related to data science, making it suitable to prototype ideas very quickly, or act as a primary language for users without a coding background.

Following are a few examples:

NVIDIA: GPU accelerated data science using RAPIDS (Hands-on)

Co-Instructor: Matthew Jones, NVIDIA
Co-Instructor: Tomek Drabas, BlazingSQL

RAPIDS is a GPU accelerated platform for data-science that greatly reduces time-to-solution. In this workshop attendees will learn about how GPUs are accelerating end-to-end data science & analytics pipelines. Analyzing large datasets can be broken down into three phases: ETL, machine learning, and visualization. Attendees will interactively walk through a sequence of tutorials representing a real-world workload of those phases using RAPIDS. They will code guided solutions using BlazingSQL, cuDF, and cuML. By the end of the workshop attendees will know how to get started with any library in the RAPIDS ecosystem.