Prof. Marc Rysman has given a seminar about why and how to use the cluster every few years. Here are the slides, with some minor updates:
Prof. Schmieder’s Slides on Research Computing
High Performance Computing for BU Economists (Previous version of slides)
Important: The slides describe how to obtain access to the cluster.
The faculty RCS liaison is Jean-Jacques Forneron. Graduate students that want access to the cluster should email Jean-Jacques.
The RCS Student Ambassador is Peter Deffebach. For help, you can reach the RCS Student Ambassador at rcs_sa_econ@scc.bu.edu.
Available software on the SCC: http://sccsvc.bu.edu/software/#/
The SCC and the pool of computers supporting the economics department are optimized for 28 core jobs. If you running large multi-core jobs (that is, using parallel processing), asking for 28 cores should get your jobs to start the fastest.
Sample code
Code is also provided for the dynamic investment problem described in the slides. Examples in Gauss, Matlab, Python, Stata, and R are provided below. In some examples, separate code is given with and without parallel processing. Note that some problems are so simple that using parallel processing may slow the program down. The code is meant as an example.
Note that sites.bu.edu accepts files only with particular extensions, so the sample computer code has .txt extensions. You might want different extensions in practice. You need to remove the .txt from the batch file to use it.
Gauss:
Matlab:
Matlab has two ways of implementing parallel processing. Examples of both are provided. For this example, SPMD is a little more efficient.
You will also need these files to run in Matlab: getVnew profit
Also, the slides describe a batch file for running Matlab in batch code on the cluster. Here is an example:
Matlab_batch
(Change the extension to .sh to use this batch file.)
Thanks goes to Mingli Chen and Kadin Tseng, who were a lot of help with the Matlab files.
Please note to change the extension for batch files to .sh to use them for the below examples.
R:
This is an example of how to run a GMM estimation on R on clusters with a batch.
R_ReadMe gmm_example R_batch
Stata:
This is an example of how to run a simple Stata do file on clusters with a batch.
R_ReadMe do_example Stata_batch
Python:
This is an example of how to conduct web scrapping on Python on cluster with a batch.
Python_ReadMe webscrap Python_batch
Use array to more efficiently submit batch jobs.
Scenario: You have a number of R scripts to run, or want to run 1 script multiple times (e.g.
Maybe you’re running simulations or doing a grid search over model parameters.)
For example, for the R example above, you want to run a GMM estimation with four different sets of initial values. Instead of submit four batches, you can submit just one batch array.
gmm_example_1gmm_example_2 gmm_example_3 gmm_example_4 batch_array
https://github.com/bu-rcs/SA-Economics
Currently Available Data
The IED and the Department of Economics have purchased several licenses for various datasets to support students in their research.
We are currently in the process of making more data available through the Research Data Network. Information regarding new data will be posted here as it become available.
Select the data below to open a detailed description:
The Nielsen Datasets from the Kilts Center for Marketing
About:
Approved users at Boston University can access the Retail Scanner data, the Consumer Panel data, and the PromoData. These datasets cover a wide range of products from a large set of retailers over time. The datasets are an ideal source for studying the purchase behavior of items typically purchased in grocery, drug, and convenience stores. The Retail Scanner data contains information on weekly price, sales, and store environment information provided by more than 90 retailers. The Consumer Panel data contains the purchases of fast-moving consumer goods for a set of 40,000 – 60,000 households over time. The PromoData contains detailed manufacturer costs and allowances, introduction of new products, and price changes for all major grocery wholesalers from major markets.
Obtaining Access:
Boston University has an institution-wide subscription to the Nielsen Datasets from Kilts. As such, the datasets are available to tenured faculty, tenure-track faculty, PhD students, and Post Doctorate students. To apply for access, follow the correct link below. Select “request subscriptions or register” under the heading new users and find Boston University under the list of institutions. Then follow the instructions. The contact for the Nielsen Datasets is Adam Guren (guren@bu.edu).
Links:
For more information and access:
https://www.chicagobooth.edu/research/kilts/datasets/nielsen
To see how other researchers have used the Nielsen Datasets:
https://papers.ssrn.com/sol3/JELJOUR_Results.cfm?form_name=journalbrowse&journal_id=1829785
Airline Origin and Destination Survey
About:
The dataset consists of a 10% sample of an airline’s tickets. The survey contains information on the origin, destination and intermediate points of flights, as well as, information on prices and distance travelled. It also contains information on the carriers, but does not include aircraft data.
Access:
Information on domestic flights is publicly available but information on domestic to international flights is restricted. The Department of Transportation manages access to this restricted flight data. Marc Rysman has a copy of the restricted data and students can use it with Department of Transportation’s approval. If you have questions regarding the data or obtaining access, please contact Marc Rysman at mrysman@bu.edu.
Links:
For more information about the data,
https://www.transtats.bts.gov/tables.asp?Table_ID=272&SYS_Table_Name=T_DB1B_TICKET
For more information on obtaining Department of Transportation approval,
https://www.bts.dot.gov/topics/airlines-and-airports/restricted-data
Indian Firm Data
About:
Available are data concerning firms in India
- Annual Survey of Industries (1998/9 to 2011/12)
- Economic Census (1998, 2005)
- NSS Unorganized Manufacturing Surveys (2000/1, 2005/6, 2010/11)
Access:
For details regarding these and instructions for access, see https://www.bu.edu/econ/files/2016/01/Presentation-on-BU-Indian-Firm-Data-April-3-2014-Aug-2015-update.pdf
Data Acquired for PhD Research Projects
About:
The following data was acquired for previous research projects, and has since become publicly available online.
The Sixth Economic Census offers a complete enumeration of all enterprises in India (except those engaged in crop plantation and cultivation, public administration, defense, and compulsory social security) and is identifiable at the village level. Available information includes but is not limited to the number of establishments, the number of persons employed therein, corresponding industries, and ownership status. Faculty and PhD students of the economics department interested in accessing this data should email iedcoord@bu.edu for more information.