Student Colloquium 2017 Computing Science

Schedule

Morning
Afternoon

08:45

The inspection of the blood vessel tree in the fundus, which is the interior surface of the eye opposite to the lens, is important in the determination of various cardiovascular diseases. This can be done manually by ophthalmoscopy, which is an effective method of analysing the retina. However, it has been suggested that using fundus photographs is more reliable than ophthalmoscopy. Additionally, these images can be used for automatic identification of the blood vessels, which can be a difficult task due to obstacles such as low contrast with the background, narrow blood vessels and various blood vessel anomalies. A segmentation method with high accuracy can serve as a significant aid in diagnosing cardiovascular diseases, as it highlights the blood vessel tree in the fundus. In recent years, several segmentation methods have been proposed for the automatic segmentation of blood vessels, ranging from using cheap and fast trainable filters to complicated neural networks and even deep learning.

In this paper we discuss and evaluate several of these methods by examining the advantages and disadvantages of each. Subsequently, we take a closer look at a filter-based method called B-COSFIRE. We study the performance of the method on test datasets of fundus images and we examine how the parameter values affect the performance. The performance is measured by comparing the extracted blood vessel tree with a manually segmented blood vessel tree. One of the datasets we consider is the recently published IOSTAR dataset and, if researchers have used the dataset already, we compare our results, using the IOSTAR dataset, with findings about blood vessel segmentation methods on this dataset in the field.

Based on our findings there we discuss when B-COSFIRE is the preferred method to use and in which circumstances could it be beneficial to use a more (computationally) complex segmentation method. We also shortly discuss areas beyond blood vessel segmentation where these methods can be used to segment elongated structures, such as rivers in satellite images or nerves of a leave.

09:35

Vector graphics primitives

Joël Grondman & Klaas Kliffen

A raster image is represented by a grid of pixels which is limited by the resolution used during drawing. To solve this problem a different approach in image representation has been developed, called vector graphics. Several vector graphics primitives exist which can mostly be categorized under elemental gradients, gradient meshes and diffusion vectors. Each primitive has its own strengths and weaknesses in terms of limitations and use of resources.

For each of these three primitives we will discuss their way of representing a vector image. There will be no objective “best” primitive, so we will focus on what kind of image can be represented by each primitive more easily. Finally we compare each primitive method by their complexity, flexibility and resource usage in the process of creating an image.

Each primitive category has their own varying methods which will be discussed and reviewed briefly including examples. Primitives in different categories rely on different methods of rendering an image. Their methods will therefore not be compared, but the effort required in using the methods as well as the resources needed will be our main way of comparing different primitives. We expect each category of primitives to have their own strengths and weaknesses which means none of them are redundant in general and their usefulness depends on the intended application.

10:05

2D keypoint detection and description

Willem Dijkstra & Tonnie Boersma

In computer vision and image processing there are several algorithms for detecting features in an image. Features in this case are 2D keypoints which represent points of interest in an image. Many computer vision algorithms use these features as the initial step for subsequent algorithms. Each algorithm varies widely in the kind of features detected (e.g. edges, corners or blobs), the computational complexity and the repeatability. Repeatability in this case refers to detecting similar keypoints in similar scenes regardless of distortion. An optimal feature detection algorithm should be able to detect a wide variety of features, have a minimal computational cost and a high repeatability. Our research will cover the following algorithms for 2D keypoint detection and description; SIFT, SURF, ORB, KAZE and BRISK.

We will compare these algorithms based on their respective paper in combination with experiments. These experiments are focused on the repeatability under various conditions like rotation and distortion of images. Based upon the results of these experiments and the provided papers, the pros and cons of the used algorithms will be discussed. For the experiments we got the advise to use the sRD-SIFT dataset which can be used for evaluation of keypoints and descriptors. This dataset includes three planar scenes that are related by an homography between 78 pairs of images, which allows the results of the experiments to be compared to a ground-truth. These images will be used to compute the detection repeatability of the 2D keypoint detection and description algorithms. In the end a comparison is made based upon the performance of the 2D keypoint detectors and descriptors, by evaluating the repeatability of the keypoints.

10:35

Unveiling storytelling and visualization of data

Ankita Dewan & Stephanie Arevalo

Yoder-Wise & Kowalski present storytelling as ”an art focused on a desire to connect with the users in a meaningful and purposeful way”. Effective visual storytelling has the power of making a connection with an audience and here lies the importance of its study. The art of creating visual stories from scientific data is not an easy task, due to the all the elements that are combined in the process. As a manner of disclosure, it can be divided into three components that will ease its analysis: scientific data, visualization and a narrative representation. These components can support each other in transforming data into knowledge, in a manner that facilitates coherent flow of information without compromising on the clarity. Moreover, every data driven story depends primarily on the context, the target audience and the message that needs to be communicated.

The purpose of this research is to highlight a lesser known side of data visualization and the potential of using storytelling as a way of describing scientific data. In a way that promotes audience engagement, increases awareness, and envisages data in an interactive way. This document intends to decode presenting data in a visual manner through storytelling. The first section of the paper contains concepts related to visual data, storytelling and the narrative process. Then, we introduce some of the popular existing tools for visualizing data and techniques in order to prepare the reader for an illustrative example using one of the most popular existing tools. The final section of the paper provides an overall conclusion on the information that was presented and how the future is envisioned in the area.

11:05

Coffee & tea break

11:15

An analysis of optimisation methods to improve data centre efficiency

D. I. Pavlov & R. M. Bwana

Even though data centers have been around for decades, due to the increased amount of digitalized data, nowadays they have become a central point for modern software technology. Some of their requirements such as scalability and availability are met at the cost of continuously increasing power consumption. It is for this reason that recent research has begun to focus on reducing the amount of conventional power consumed by data centers.

The literature available has focused on introducing algorithms to increase the efficiency of data centers in terms of network efficiency, job scheduling efficiency, and power efficiency: including introducing the use of renewable energy sources alongside the conventional non-renewable energy sources.

Our research focuses on reviewing literature that discusses increasing data center energy efficiency through the use of optimization strategies as typically found in Machine Learning. This is generally achieved through describing a cost function representing appropriate negative metrics that the model then actively seeks to minimize and in doing so, in the case of the literature we focus on, reduces energy consumption at any given moment. We analyze literature discussing these optimization methods for data centers and the results they obtained. In doing so we seek to address whether or not these optimization methods have, or could, adequately improve data center energy efficiency.

11:45

Languages for software-defined networks

Aida Baxhaku & Timo Smit

Software-defined networking (SDN) is gaining a lot of attention over the past years as the more future-proof approach of organizing networks in terms of both software and hardware. The notion of SDN originates from two main concepts: a more extensive feature set, to allow for more dynamic and scalable solutions, as well as to decouple software and hardware, which allows software to adapt for changing requirements independently from used hardware.

In recent years many advances have emerged, particularly in the field of information technology, data science and networking. However, the traditional technologies behind networks are often low-level and lack features to support the nowadays common modular approach of networking. Where networks used to be declared statically and heavy in terms of maintenance, this approach involves high availability and scalability, by allowing extra instances to spawn and connect on-the-fly to relieve the workload on other nodes in the network. With this development, the overall complexity and problems that come network management have grown, e.g. monitoring network traffic, specifying and composing packet forwarding policies, and updating these policies in a consistent way. In this paper we will discuss the available languages for network programming to date, as well as find out what their limitations are. We do so by analyzing their specifications, performing literature research on the selected set of languages and finally comparing the results of the analysis. We evaluate and compare the available languages based on their requirements, their feature set and their performance.

12:15

Stochastic Gradient Optimization: Adam and Eve

Jos van de Wolfshaar & Siebert Looije

Recently, a large share of the machine learning community has shifted its focus towards deep learning. This paradigm shift is due to significant successes in a wide range of sub-domains such as image classification, face verification and speech recognition, often outperforming humans at the task they are trained for. The introduction of new optimization algorithms for gradient descent have boosted the scientific progress. These algorithms seek to minimize a cost function such that e.g. a classifier learns to perform its task. Until recently, the most popular form of gradient descent has been stochastic gradient descent (SGD).

However, in the past few years alternative optimization algorithms have been introduced that exhibit superior convergence guarantees and speeds. Two particularly successful alternatives to SGD are Adam and Eve. These algorithms adaptively tune the learning rate, yielding state-of-the-art performance. Both algorithms use running averages of the gradient and the squared gradient. Eve improves on Adam by also exploiting loss-function feedback. In our paper, we validate the improvement of Eve over Adam in a range of experiments that test the applicability outside the domain of the experiments that the authors of the Eve algorithm originally assessed. By comparing the optimizers in this unexplored area of problems, we hope to find empirical support for the superiority of the Eve optimizer. If our experiments’ results suggest otherwise, we try to characterize what has caused the relatively impeded performance.

12:45

Introduction of keynote speaker

12:50

Keynote speaker: Jeroen Vlek

13:20

Lunch break

13:50

Explaining Multidimensional Visualization Techniques

Harry Jackson Arroyo & Carlos H. Paz Rodriguez

With the continuous increase of big scientific and business datasets and the need to analyse them, it is becoming more important to find tools to enable a better and more intuitive understanding of the information being presented, especially when visualizing complex interactions between many data features or dimensions. Thus, many methods have been developed to cope with some of the previous data-analysis needs; and to provide users with the means to expedite their decision-making process by presenting clearer conclusions to the viewers. By explaining a few real-world high-dimensional dataset projection techniques, we can offer end users with one or more recommendations for different scenarios. We look at some methodologies, such as multidimensional categorical datasets, attribute-based visual explanations and 3D dimensionality reduction plots; and list the advantages and disadvantages of each technique from a scientific and business oriented perspective. This will provide the end users a broader understanding of the most common problems when dealing with high-dimensional datasets and help them better decide which tool should be applied based on their needs.

We will compare each method based on its ability to intuitively reflect what is the most important discovery or confirmation the dataset produces the level of interaction or options it provides and the multipurpose flexibility it offers. Each technique should be capable of producing usable results in any given scenario but we expect each is probably better for some specific scenarios.

14:20

Support for Cloud Migration

Peter Ullrich & Timon Back

Migrating an enterprise system toward a cloud computing environment requires ample preparation and poses certain risks and difficulties, which are best evaluated before starting the process. Moreover, the migration process is iteratively performed in multiple steps, for each of which scientific tools offer automatizing and support.

In this paper, we describe the steps necessary for migrating an enterprise system to a cloud based infrastructure. We give an overview of a chosen collection of scientific tools, which help with each of these steps. The migration process can be divided into a preparation and execution part, each of which use a different set of tools. Tools used in the preparation phase offer decision support for finding the best suited cloud service provider and cloud machine instance type to avoid under- and over-provisioning. Some of the considered tools for the execution phase contain frameworks for semi-automatic migration, which also take a look at dependencies in the software or processes.

We present the approaches of these tools and give an analysis of their usefulness depending on the situation in which they are used. Eventually, we identify areas of the migration steps, which lack coverage of scientific tool support and propose functionality for such tools.

14:50

The ideal architecture decision management approach

Petar Hariskov & Alexandra Matreata

Context: Architectural knowledge management defines key aspects of system architectures such as the traceability of design issues towards implementations, communication and cooperation between different parties such as stakeholders, architects, developers. By interacting with mechanisms for capturing, sharing and managing changes in different design decisions from an early stage, architects are able to build solid, robust and more easily maintainable systems.

Problem: Despite the numerous advantages and improvements in quality it brings, architectural knowledge management is rarely implemented in practice. The reason for this is the highly time consuming nature of the process. In practical real-life scenarios, projects are often required to have their time to market as compressed as possible, leading to the neglection of this phase in most cases. The key focus of all architectural knowledge management activities is represented by design decisions and a significant amount of research has been conducted on this particular branch.

Solution: The aim of this paper is to asses the utility and quality of different existing tools and approaches for architecture decision management and determine what attributes and specifications could be associated with an "ideal" such approach. The assessment of the value of different methods used for architectural knowledge management will be performed through a secondary research process taking into consideration a collection of carefully selected primary studies. Significant and precise indications as to what an ideal approach consists of, will be obtained by comparing and analysing the solutions described in these studies against each-other and against estimated expectations of the parties involved in the project architecting phase(stakeholders, architects, etc.).

15:20

Coffee & tea break

15:30

An Overview of Energy Efficient Scheduling in Data Centers

Win Leong Xuan & Martin Glova

Nowadays, we are living in a society where everything is migrating to the cloud, or so called "the internet" and people want to stay connected everywhere they go. Therefore, data centers are being established in order to fulfill this new lifestyle. Data centers consume a massive amount of energy in order to run its computational tasks. Most of this energy is being produced from traditional fossil fuels. Renewable energy is also one of the major trending technologies when considering the protection of the environment and the energy crisis for fossil fuels. As the worldwide awareness of environment protection and energy crisis increases, the demand for the cleaner products and services also increases. The challenge which comes with renewable energy is that it is dependent on different factors e.g. the time of day, the weather, the season and so on.

The purpose of the paper is to present mathematical models of energy scheduling and different scheduling strategies with a focus on optimizing different properties. Because of production fossil fuels, some of the strategies optimize green energy use. In another way, some of them optimize utilization connected with energy using at all which also decrease energy costs. We will describe some of the existing solutions as Smart green energy-efficient scheduling strategy and others.

Finally, we will discuss the tradeoffs between the different scheduling strategies and what results did the different scheduling methods achieve. We will also discuss the experimental use of different scheduling strategies and reinterpret their results.

16:00

Assessing the Novelty of Extreme Learning Machine (ELM)

Fthi Abadi & Remi Brandt

Extreme learning (Haung et al. in 2004) is a learning algorithm for single hidden layer feedforward neural networks. In the ELM, the weights from the input layer and bias to the hidden layer nodes are assigned randomly. The weights from the hidden layer to the output layer nodes are determined analytically. According to its authors, the ELM provides superior generalization and learning efficiency compared to the prior learning algorithms which use an iterative strategy to determine neural network parameters. It has been claimed that the essence of the ELM was proposed prior to Haung et al. by Schmidt et al. in 1992, Pao et al. in 1992 and Broomhead and Lowe in 1988. This has led to a debate concerning the novelty of the ELM and hence the necessity and justification to introduce a new name: "Extreme learning machine".

The ELM as well as prior work of which it is supposed that the essence is equal to that of the ELM (Schmidt et al. in 1992, Pao et al. in 1992 and Broomhead and Lowe in 1988) will be discussed. The similarities and differences of the algorithms proposed in said papers with respect to learning strategy, approximation and unknown determination will be discussed. Subsequently, arguments regarding the uniqueness of the methodologies in the ELM compared to the methodologies proposed by the prior work will be presented. Finally, conclusions regarding the novelty of the ELM will be given.

16:30

Techniques for the comparison of public cloud providers

Frans Simanjuntak & Marco Gunnink

Cloud computing is the delivery of computing services over the Internet. It enables users to deploy applications on hardware provided by external providers, rather than building and maintaining their own servers. Cloud computing rose to popularity from 2006 with the introduction of Amazon Elastic Compute Cloud, Microsoft’s Azure and the Google App Engine. The main advantages of cloud computing for the users are: cost efficiency, scalability, rapid deployment and accessibility. It also offers advantages to the cloud providers, such as cost efficiency and a more optimal usage of their computing infrastructure. There are also downsides: security, dependence on Internet connectivity and less control over the platform.

We mentioned a few public cloud providers and there are several more. Choosing the most suitable one for a business’ need is not easy. Different cloud providers offer different trade-offs in performance, cost and functionality. Comparing them is difficult, since a lot depends on what the business needs while all cloud providers attempt to make themselves appear attractive to all potential customers. Despite this, several techniques for comparing public cloud providers have been researched and developed, such as: Analytic Hierarchy Process (AHP), Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), ELECTRE, CSP (Cloud Service Provider), CloudCmp framework, Fuzzy Inference System (FIS), Multi Attribute Group Decision Making (MAGDM). In this paper we compare AHP, CloudCmp and the TOPSIS to determine the most suitable comparison technique.

17:00

Multidimensional projections: Scalability, Usability and Quality

Frank Mol

In many application fields large amounts of multidimensional data are produced. In fields such as engineering, medical sciences and business intelligence, data is produced where each data point has a large amount of attributes. Because of the large amount of variables (larger than 5) per data point, this data is considered high dimensional Multidimensional Projection (MP) makes use of techniques that can visualize data with a high number of dimensions, in either 2D or 3D. In our paper, we restrict ourselves to MP techniques which map from n dimensions to 2 dimensions. LAMP, LLE, Isomap and t-SNE are such techniques which are currently state-of-the art. All of the techniques face a number of challenges. On of the challenges is the comprehensiveness of the projection. Another challenge is that a projection needs to be able to visualize certain information on the given data, i.e. data patterns. Also, a challenge above mentioned techniques face is the time complexity.

Since the techniques differ in the above mentioned challenges, a user needs to select a technique that suits their goal. Hence, we give an overview of state-of-the-art MP techniques, which we will compare using the following three characteristics: scalability, usability and quality. One of the papers suggest that many different techniques exist that create a 2D projection. These can be classified as follows: "Distance versus neighborhood preserving", "Global versus local" and "Dimension versus distance". We will use at least one technique of each classification in our analysis.

Last update: 30-03-2017

Keynote speaker

Jeroen Vlek

CTO of Anchormen

Digitisation is changing the world at an unprecedented pace. While continuing to learn from the past, companies need to be aware of the current state of play and to anticipate future developments. Data volumes, data variety and the speed of data are all increasing. By investigating structured and unstructured data, (prediction) models can be created, resulting in valuable new insights. These insights help companies to respond at the right time and in the right way, based on the right information. Jeroen Vlek, CTO of Anchormen, will show several examples of this 'actionable insight', whether this is predicting maintenance on railway switches, preventing dirty drinking water or giving personal recommendations to consumers.

About the conference

7^th

April

2017

Bernoulliborg 5161

Nijenborgh 9, 9747 AG Groningen

Every year, students of the Computing Science Master's programme at the University of Groningen will organize a student colloquium conference SC@RUG, bringing together students from Computing Science and its staff. SC@RUG is devoted to research in computing science. Previous sc@RUG have had a broad range of presentations in the field of surveys, tutorials and case studies, and we hope to even extend that range this year.

This year, the 14th iteration of the conference will take place. It will be held on the 7th of April 2017.

Organizing committee

This years organizing committee consists of the following members:

Michiel Straat
Timo Smit
Fthi Arefayne Abadi
Alexandra Matreata
Ankita Dewan
Frans Simanjuntak
Joël Grondman
Robert Bwana
Carlos Paz
Win Leong Xuan
Siebert Looije
Peter Ullrich
Aida Baxhaku

Sponsors

Schedule

Doors open

Introduction by honorary chair

Segmentation of blood vessels in retinal fundus images

Vector graphics primitives

2D keypoint detection and description

Unveiling storytelling and visualization of data

Coffee & tea break

An analysis of optimisation methods to improve data centre efficiency

Languages for software-defined networks

Stochastic Gradient Optimization: Adam and Eve

Introduction of keynote speaker

Keynote speaker: Jeroen Vlek

Lunch break

Explaining Multidimensional Visualization Techniques

Support for Cloud Migration

The ideal architecture decision management approach

Coffee & tea break

An Overview of Energy Efficient Scheduling in Data Centers

Assessing the Novelty of Extreme Learning Machine (ELM)

Techniques for the comparison of public cloud providers

Multidimensional projections: Scalability, Usability and Quality

Keynote speaker

Jeroen Vlek

CTO of Anchormen

About the conference

Organizing committee

Location