Feb 3rd, 2017
9:00 am - 12:00 pm
Instructors: Anne Fouilloux
Helpers: Hugues Fontenelle
Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.
For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".
Who: The course is aimed at graduate students and other researchers. This one-day Carpentry@UiO hands-on workshop will give a short introduction to big data analysis using pyspark. The Spark Python API (PySpark) exposes the Spark programming model to Python. ApacheĀ® Sparkā¢ is an open source and is one of the most popular Big Data frameworks for scaling up your tasks in a cluster. It was developed to utilize distributed, in-memory data structures to improve data processing speeds. A basic knowledge of python is recommended but you don't need to have any previous knowledge of big data analysis or Apache Spark.
Where: FIXME. Get directions with OpenStreetMap or Google Maps.
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Software Carpentry's Code of Conduct.
Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:
Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch and we will attempt to provide them.
Contact: Please email contact-us@swcarpentry.uio.no for more information.
09:00 | Introduction to Big data |
10:00 | MapReduce Programming Paradigm |
10:30 | Coffee |
11:00 | MapReduce Programming Paradigm |
12:00 | Wrap-up |
Etherpad: http://pad.software-carpentry.org/2017-02-03-pyspark.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.
To ease our work and avoid installing Spark on your laptop, we will be using the UIO Galaxy eduPortal.
If you haven't received a login and password yet, don't panic. This can be handled in few minutes during the workshop.
For the workshop you will need a web browser (firefox, google chrome or internet explorer) and be able to establish a wireless
internet connection. For more information on how to connect to the wireless network at UIO, see
"Connect to UIO wireless".
If you are not affiliated with the University of Oslo and do not have an eduroam account, you can still use our guest WIFI network.
See detailed instructions here.
Remark: without changing your pySpark code, you will be able to scale up your code to hundred processors on any cluster or HPC system.
At the University of Olso, you may use the UIO HPC abel... See detailed information
here.
It is not mandatory to install python on your laptop, but the first part of the lesson is done with pure python and this is why we suggest you install python on your laptop. To participate in a Software Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.
Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).
We will teach Python using the IPython notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).
bash Anaconda3-and then press tab. The name of the file you just downloaded should appear.
yes
and
press enter to approve the license. Press enter to approve the
default location for the files. Type yes
and
press enter to prepend Anaconda to your PATH
(this makes the Anaconda distribution the default Python).