I'm a research associate at the Performant and Available Distributed Computing Systems Lab, York University. My research focuses on performant, secured, and optimized software systems, big data analytics, and empirical software engineering.
I have strong knowledge and experience in serverless computing, cloud applications with high availability and scalability, and large-scale distributed data collection and processing. I have extensive experience in working with AWS Lambda, Docker, and Kubernetes. Currently, my research focuses on performance modeling and optimization of serverless applications. I am also an experienced developer with strong experience in Django, Flask, PostgreSQL, Elasticsearch, Metricbeat, and Kibana.
Sep 2021 - Present
Working on performance modeling and optimization of serverless applications hosted on public cloud platforms.
Working on the project: Optimizing Serverless Platforms using Machine Learning
Sep 2018 - Aug 2019
Worked on performance modeling and optimization of functions in the serverless framework.
In collaboration with BlackBerry, worked on the autonomous security management of IoT systems. Implemented blockchain-based techniques to improve the security of cyber-physical systems.
Jan 2018 - April 2018
Led the project to develop a high-availability system for automatically generating and marking unique assignments and examination papers varying with students' ID numbers.
Built and maintained multi-tenant Linux servers for high performance computing, machine learning, and big data analytics.
Sep 2018 - Aug 2021
Supervisor: Dr. Hamzeh Khazaei
Coursework: Machine Learning in Software Engineering, Data Analysis and Knowledge Discovery, Software Quality, Dependable Systems, Cyber-Physical Systems
Sep 2014 - July 2018
Selected Coursework: Computer Architecture and Microprocessors, Object-oriented Programming, Algorithms and Data Structures, DSP, IC Design
Awarded National Scholarship and Outstadning Undergraduate Thesis
In this paper, we present a methodology based on blockchain smart contracts to describe, grant, and revoke fine-grained permissions for building users in a decentralized fashion. This method supports access control using resource description framework (RDF) graphs and implements two APIs for client applications. Leveraging the metadata of a real building, we have applied the proposed method to manage privileges in some realistic use-cases and shown that it can greatly reduce the administration overhead while providing fine-grained access control.
The main concept behind serverless computing is to build and run applications without the need for server management. It refers to a fine-grained deployment model where applications, comprising of one or more functions, are uploaded to a platform and then executed, scaled, and billed in response to the exact demand needed at the moment. While elite cloud vendors such as Amazon, Google, Microsoft, and IBM are now providing serverless computing, their approach for the placement of functions, i.e. associated container or sandbox, on servers is oblivious to the workload which may lead to poor performance and/or higher operational cost for software owners. In this paper, using statistical machine learning, we design and evaluate an adaptive function placement algorithm which can be used by serverless computing platforms to optimize the performance of running functions while minimizing the operational cost. Given a fixed amount of resources, our smart spread function placement algorithm results in higher performance compared to existing approaches; this will be achieved by maintaining the users' desired quality of service for a longer time which prevents premature scaling of the cloud resources. Extensive experimental studies revealed that the proposed adaptive function placement algorithm can be easily adopted by serverless computing providers and integrated to container orchestration platforms without introducing any limiting side effects.
Online resources today contain an abundant amount of code snippets for documentation, collaboration, learning, and problem-solving purposes. Their executability in a "plug and play" manner enables us to confirm their quality and use them directly in projects. But, in practice that is often not the case due to several requirements violations or incompleteness. However, it is a difficult task to investigate the executability on a large scale due to different possible errors during the execution. We have developed a scalable framework to investigate this for SOTorrent Python snippets. We found that with minor adjustments, 27.92% of snippets are executable. The executability has not changed significantly over time. The code snippets referenced in GitHub are more likely to be directly executable. But executability does not affect the chances of the answer to be selected as the accepted answer significantly. These properties help us understand and improve the interaction of users with online resources that include code snippets.
Function-as-a-Service (FaaS) and serverless applications have proliferated significantly in recent years because of their high scalability, ease of resource management, and pay-as-you-go pricing model. However, cloud users are facing practical problems when they migrate their applications to the serverless pattern, which are the lack of analytical performance and billing model and the trade-off between limited budget and the desired quality of service of serverless applications. In this paper, we fill this gap by proposing and answering two research questions regarding the prediction and optimization of performance and cost of serverless applications. We propose a new construct to formally define a serverless application workflow, and then implement analytical models to predict the average end-to-end response time and the cost of the workflow. Consequently, we propose a heuristic algorithm named Probability Refined Critical Path Greedy algorithm (PRCP) with four greedy strategies to answer two fundamental optimization questions regarding the performance and the cost. We extensively evaluate the proposed models by conducting experimentation on AWS Lambda and Step Functions. Our analytical models can predict the performance and cost of serverless applications with more than 98% accuracy. The PRCP algorithms can achieve the optimal configurations of serverless applications with 97% accuracy on average.
Docker is currently one of the most popular con-tainerization solutions. Previous work investigated various characteristics of the Docker ecosystem, but has mainly focused on Dockerfiles from GitHub, limiting the type of questions that can be asked, and did not investigate evolution aspects. In this paper, we create a recent and more comprehensive data set by collecting data from Docker Hub, GitHub, and Bitbucket. Our data set contains information about 3,364,529 Docker images and 378,615 git repositories behind them. Using this data set, we conduct a large-scale empirical study with four research questions where we reproduce previously explored characteristics (e.g., popular languages and base images), investigate new characteristics such as image tagging practices, and study evolution trends. Our results demonstrate the maturity of the Docker ecosystem: we find more reliance on ready-to-use language and application base images as opposed to yet-to-be-configured OS images, a downward trend of Docker image sizes demonstrating the adoption of best practices of keeping images small, and a declining trend in the number of smells in Dockerfiles suggesting a general improvement in quality. On the downside, we find an upward trend in using obsolete OS base images, posing security risks, and find problematic usages of the latest tag, including version lagging. Overall, our results bring good news such as more developers following best practices, but they also indicate the need to build tools and infrastructure embracing new trends and addressing potential issues.
This work has been submitted as a paper for peer review, and it is still under review. Details will be available soon.
consuming and laborious, but also has some human factors affecting the test results. Based on practical demands of Advanced Optoelectronic Technology Laboratory of East China Normal University, an admittance testing system for solar cells with automatic data retrieval is designed and implemented. This work presents the realization of instrumentation, automated data acquisition and control systems . The development of testing software, such as software used for voltage/frequency Vs capacity testing and software used for temperature/frequency Vs capacity testing, is also presented . The system can automatically perform admittance testing according parameters given by users, r etrieve and visualize data in real time and process data and obtain results by algorithms. The system has high robustness and expansibility, which greatly improves the efficiency and accuracy of testing, and provides, to some extent, technical reference fo r realization of instrumentation, automated data acquisition and control systems for other instruments.
Room 2017, Lassonde Building,
Keele Campus, York University,
changyuan.lin [at] hotmail [dot] com