Student Colloquium 2014

Target Holding BV was founded in 2009, and as a partner in the Target Project the responsibility of valorization of knowledge. Target intends to build in the Northern Netherlands, especially focused on data management of very large amounts of data coming from sensor networks. Sustainable economic cluster of intelligent information.

Target Holding offers solutions for storage, analysis, processing, archiving and searching in the area of large-scale data intelligence, through collaboration with research institutes and the Target knowledge cluster in the northern Netherlands. Target Holding therefore has an exclusive knowledge advantage that economically profitable activities are initiated by:

shareholdings and participations.
on the licensing of advanced technologies, products and components.
services through projects, consultancy, hosting and management.

ilionx is a medium-sized IT company (+ / - 230 people) that implements solutions to customers throughout The Netherlands. ilionx does this in several fields, including Business Intelligence, SharePoint, dot Net, IBM, Cloud solutions and consultancy. Key-note speaker Matthijs Vogt is Lead Business Intelligence Consultant at Information Management ilionx North, which means that he's involved in implementations of business intelligence solutions for customers from beginning to end.

The Key-note speaker will elaborate on the following subject: In the world of process control and sensor technology, OT (operational technology) and IT (information technology) are often far apart. However, the value of the data from the OT is nowadays seen in an IT perspective more often. Traditional IT solutions such as business intelligence and analytic solutions can be applied to data from the OT world which yield valuable new insights. At Gasunie the gap between IT and OT has been closed by implementing a data warehouse on a Microsoft FastTrack environment where measurements from the field are collected and can be analyzed .

Connected operators are filtering tools that act by merging elementary regions. These operators cannot create new contours nor modify their position. Therefore, they are very attractive for filtering tasks where the contour information has to be preserved. Usually, a connectivity class consists of the standard four or eight neighbor relation in 2D images, which is often too rigid since it cannot model generalized groupings such as object clusters or partitions. To mediate this, second generation connectivity operators were introduced, which extend beyond the standard four-connected, and eight-connected path connectedness.

Supplementing clustering and contraction based second-generation connectivity, attribute-space connectivity was introduced by M.H. Wilkin son. Attribute-space connectivity is a new form of connectivity, which applies a transformation to the image into a higher dimensional space in which the connectivity is determined. The connectivity in the higher dimensional attribute-space is then projected back on the original image. Another interesting extension is mask-based connectivity, introduced by G.K. Ouzounis, where the structure of the image is not determined by the image itself, but determined by the connectivity in a separately supplied mask image.

We provide a short introduction to the subfield of mathematical morphology, and provide an in-depth analysis of both new forms of second generation connectivity, accompanied by a comprehensive set of examples. We will discuss the key differences of both new approaches compared to each other, as well as with the earlier notions of connectivity. Additionally, we provide some insight in the computational aspects, when these connectivities are applied on gray scale images, by means of the dual input max-tree structure.

Task scheduling plays a key role in cloud computing systems. Though several scheduling algorithms have been proposed, it is often the case that each approached the problem from its authors’ perspective. The generality and lack of standard definition for cloud computing as well as various usage and deployment scenarios allowed researchers to add customized assumptions and constraints to their scheduling models and prefer specific requirements over the others. Moreover, being NP-Complete problem directed researchers to propose heuristic algorithms that finds a ‘reasonably good’ schedule rather than evaluate all possible schedules. In this paper we give an overview on the problem of task scheduling in the context of IaaS clouds. Moreover, we present a variety of cloud scheduling algorithms highlighting their main features, concerns and objectives. Through our work, we clarify possible reasons behind the diversity of available algorithms.

An Android application is given access to resources based on what its implementers have speciﬁed. Users then have to allow access to these resources when they want to install an app on their mobile device. Often more permissions are granted than strictly necessary for the functionality of a certain application. Even apps not requesting more permissions than needed, are potentially abusing these granted permissions. Several tools have been developed to analyse these permissions granted to Android applications. These tools check if an application is granted permissions it does not need or abuses the granted permissions.

Our research veriﬁes which of these tools is the most useful in applying a security check to Android apps before these apps appear on the Android market. When an app is not accepted by one of these tools, it should be marked as untrustworthy, otherwise it is allowed to be published.

For some of the tools developed to analyse apps, we check if they provide a valuable contribution to secure the Android market. Different tools have different approaches such as looking at applications’ permissions or the usage of these permissions for malicious ends. Our research evaluates these tools with regard to this subject. Furthermore, we elaborate on the beneﬁts of reducing permissions and malware for the privacy of Android users.

Morphological operators are a powerful approach to image processing. Although numerous proposals have been made, the basic morphological operators have no standard definitions in the case of colour imagery. In grey-scale images, the most basic morphological operators, dilation and erosion, are defined as respectively the maximum and minimum of a neighbourhood of pixels. But how can extrema be defined on colour values?

Most of the existing techniques treat the colour values as vector data, and define an ordering on these vectors. This allows the standard 2D grey-scale erosion and dilation methods to be used. Many different orderings can be used for this, like lexicographical or vector length ordering. The ordering can also be defined in different colour spaces, such as RGB or HSV.

Our research will investigate the performance of existing techniques for varying tasks. Since different techniques have different properties, it will be interesting to see how these properties affect a technique’s performance. Interesting properties in this case are vector preservation and equal treatment of different colour channels, since these are properties often considered important for colour images.

Several tasks will be looked at, including noise removal and edge detection. Different properties will probably cause different performances in these situations. As a consequence, we do not expect to find one ultimate solution, but hope to give a better view of the possibilities for achieving desired results.

One of the most important discoveries in the field of psychopathology in the past century, is that comorbidity (two or more mental disorders happening at the same time) is not exceptional but rather prevailing. In recent years there has been an increasing interest in applying graph analysis to networks which are constructed from symptoms, mental disorders and comorbidity. Up and until now, research has tended to focus on centrality measures in graphs rather than using weighted elements in graphs.

In our research we review literature concerning the usefulness of using weighted elements in link analysis algorithms on graphs to analyze comorbidity. Applying link analysis algorithms might yield interesting results when compared to existing methods; especially in finding causal relationships between symptoms and mental disorders.

We first state possible results that a psychopathology researcher could be interested in. Next, we assess webgraph algorithms including PageRank and HITS on their applicability and suitability for the analysis of symptom graphs. To do so, we compare the measured properties of a graph to the expected properties of applying these algorithms. As link analysis tends towards finding important nodes and paths, we expect to find that the algorithms will classify certain symptoms and disorders as prevailing, which is different from the results of centrality measures that only indicate hot spots within a graph. For psychopathologists, these results could be very interesting since mental disorders and symptoms have been assessed as mutually exclusive conditions up and until now.

Image editing tasks concern either global changes (color/intensity corrections, filters, deformations) or local changes confined to a selection of the image. In this paper we are interested in achieving local changes that are restricted to a manually selected region. These changes range from removing slight distortions in images to replacing content in an image by novel content.

Classic tools are available that achieve interactive cut-and-paste with cloning tools that are used for complete replacements of content, and image filters for slight changes. An example of this is the cloning stamp in Adobe Photoshop. However, these tools result in visible seams and distortions, which can only be partly hidden by feathering along the contour of the local selection.

We will describe and compare these approaches based on applicability, ease of use, speed and quality of resulting images. We will implement these methods, if possible, or use implementations readily available on the internet. We will create a fixed set of images to facilitate the comparison of the different methods, highlighting the advantages and disadvantages of each method.

Social networks are networks of individual actors that are related by some sort of social interaction, just as is known from Facebook and Twitter. There are specific regions or subsets within these social networks that are particularly interesting, namely the regions that are more densely connected than others known as communities. A lot of research has been done on identifying different communities within social networks, however, most of them involve static graph analysis that does not take time based evolution into account. Since real-time social networks tend to change very rapidly over time, the existing identification strategies for detecting communities fall short and new methods having this time constrain need to be developed.

Social networks that change quickly over time are called dynamic social networks. One of the key problems in dealing with dynamic social networks is volatility. Detecting communities within static graphs alone is already very expensive, so recomputing communities after every change is not an appealing option. One of the possibilities is to identify the community structure only once and to adaptively update this structure based on the activities that occur. Furthermore, it is possible to make use of heuristics to accurately identify communities in dynamic social networks.

In this paper we compare different approaches for the identification of communities in dynamic social networks. We look at the scalability and at the accuracy of these approaches using datasets of different sizes. Furthermore, we look at the possibility to combine these different strategies in order to develop new methods to identify and keep track of the evolution of communities within real-time social networks.

Design patterns are commonly used to solve software design problems. In the process of selecting design patterns, the effect on software quality attributes is often overlooked. The main reason for this is that the common believe is that the negative influence on software quality is negligible. The software development field lacks knowledge on the real effect of design patterns on software quality and the results of different researches on this subject are contradicting.

Our research will focus on a selection of the design patterns of the Gang of Four (GoF). This is done from the viewpoint of software developers, who have a lot of influence on the pattern selection process. Additional information is needed to improve the selection process of design patterns and thus increasing software quality.

We will perform an empirical research by conducting a survey among software developers from open source projects. Given our results, we will verify the research results from the existing papers and generate valuable data on this subject. We expect to give more clarity on the rationale behind using design patterns, which design patterns are most frequently used, which quality attributes are considered the most important and the impact of design patterns on the software quality attributes from the ISO/IEC 9126 standard.

As the use and the complexity of the object-oriented software systems increase, it becomes necessary for organizations to maintain those systems in a cost effective way. Maintainability, is the quality attributes that indicates the ease with which a software system can be modified, can be debugged or can conform to changing requirements. Quantifying maintainability is necessary from the early phases of development, because it provides useful information to improve the design, the code and the overall software quality.

In order to quantify maintainability, several maintainability prediction models have been introduced the last twenty years. Our research will be focused on two recent maintainability prediction models, which have been characterized as the most accurate, by an independent study. The first is based on multivariate adaptive regression splines (MARS) and the second on Bayesian networks. Both models are constructed using the metrics and dataset proposed by Li and Henry.

In this study, we compare the above-mentioned models, with respect to their techniques, the calculated metrics and the produced results. In addition to this, we will apply the prediction models on various open source projects. Then we will compare the prediction accuracy of the two models to the accuracy of other pre-existing models used for predicting maintainability. The results show that the prediction accuracy of the models vary, depending on the characteristics of the data sets.

Technical debt is the result of writing code that is complex, hard to extend and maintain, poorly documented or poorly tested. Unlike with financial debt, it is often not known what the impact of gaining technical debt has on a software project, since its need for change or impact on future requirements may not be known. Several methods have been developed to gain insight in the technical debt of software, in order to make guided decisions on how and when debt has to be paid. This helps software teams in finding a balance between implementing new requirements and keeping the code at high quality.

Some of these methods may be used in every software project, whereas more extensive methods require continuous evaluation and parameterization. The goal of our research is to establish a comparison between the methods on multiple aspects, to find the optimum technique depending on the project and team.

We will discuss proposed methods, their pros and cons and how easily they can be integrated into the development process of a software project. Furthermore, we will discuss how the size of the project team and experience of team members affects the effectiveness of a method. This allows us to give a recommendation of which method to use under what conditions, so that managing technical debt in a justified way becomes in reach of a variety of projects.

The key of decreasing the amount of computational resources needed for molecular simulations of macromolecules in water is by decreasing the degrees of freedom of the computational box and decreasing the amount of necessary solvent. Molecular simulations use a computational box for periodic boundary conditions. Most simulations are done in a box with a high degree of freedom, i.e. a rhombic dodecahedron. Decreasing this degree of freedom by transforming this computational box into a triclinic box results in a decrease of computational resources.

The solvent for a molecular simulation scales with the volume of the computational box. By using a near densest lattice packing, this volume is decreased to a near-minimal volume necessary for the simulation, resulting in a decrease of solvent in the simulation. In this paper a method is presented for transforming any computational box into a triclinic box and two different methods to obtain the near-minimal volume of the box. The first method focuses mainly on finding a near-densest lattice packing of a shape M while the second method will discuss how this can be optimized into a faster method and how this method can be used for ensembles.

Speakers

Keynote by Target Holding

Gert-Jan van Dijk

Keynote by ilionX

Matthijs Vogt

Connected theory and connected filters

Herman Schubert and Jeroen Lanting

An overview of available cloud task scheduling algorithms

Fokko Driesprong and Mohammed El Sioufy

Android Applications - Evaluating Tools to Secure the Android market

H.B. van Apeldoorn and M. Hoekstra

Comparing colour morphological operators

Stephan Groenewold and Klaas Winter

Applying network analysis to psychopathology

Laurence de Jong and Diederik Jan Lemkes

A Review of Seamless Image Cloning Techniques

Lukas de Boer and Jan Veldthuis

Tracking communities in Dynamic Social Networks

Marc Holterman and Arjen Zijlstra

A Survey on Design Patterns and Software Quality Attributes

Wytse Visser and Pascal Bouwers

Comparing Software Maintainability Predictors

Antonios Gkortzis and Razvan Florea

In depth with Technical Debt

Djurre de Boer and Joost Koehoorn

Decreasing the amount of computational resources needed for molecular simulations of macromolecules in water

Robbert-Jan Pijpker and Aloys Akkerman