A hackathon is an addictive vocation for many professionals. It is assumed that professionals do it for sake of winning the game and adding some weight to their resume. Though it is true, most people participate in these competitions to seek a thrill. Hackathons last for just a day or two, wherein the participant teams have to slog out to come up with a cool solution. Sleepless nights, endless coffees, messing around with cluttered code, rapid-fire decisions, on-the-fly collaborations, and bickering over unsuccessful outcomes is common. Data science hackathon though is a bit different from other coding hackathons, it checks all the above conditions. Data science hackathons need to solve a data-based problem. As data and the problem statement are provided, half the problem is solved. However, it is a different ball game altogether because it challenges the participants with different analytical skills involving data and drawing meaningful insights from a variety of data sets. Brace yourself with the following tips in case you are planning to participate in a data science hackathon.
Data Science hackathons come with the 'problem'. While finding a solution to your problem is one thing, solving another's problem, is quite another. Many participants fail at this very first step. Read as much as possible about the problem and the attributes of the problem. Choose one wrong attribute for the model building, the process will go haywire. Skim through the information thoroughly before starting to write the code, because the insights you develop only can guide you through the code development.
Building a hypothesis set is crucial for building a data science project, for it is important to see what you need to see. Basically, it is like asking yourself the right questions as to what you want to find out before deep-diving into the data. The meaning of the numbers changes according to the question you have in mind. For example, if you want to find out the effect a clothing brand has on the way people adapt to formals, it is weekly data of office goers which are needed, otherwise for holiday going people, perhaps the data around Christmas would be enough, depending on the country under study.
Team building and collaboration lie at the core of the hackathon. Identifying people with unique skills, who can perform under pressure is crucial for getting the wheels rolling. In other words, hackathons can be considered a team-building exercise because, after the core skills, team building skills are the ones that are put to test. Mostly, teams are made up of around 5 to 6 members, including coders, managers, designers, etc. Though, sometimes including people with diverse skill sets is necessary, assigning tasks based on their core skills might save you from the perils of taking risks.
You may be a good programmer but that doesn't mean you need to build everything from scratch. You can depend on a generic code base for certain repetitive functions like generating time modules, creating a base module, upon which other dependent modules can be designed, etc. It all boils down to the hackathon rules ie., if access to libraries is allowed, if the process is considered for judgement rather than the result, etc. The entire hackathon is about the two-minute pitch of the final product and the focus should be on making it perfect and not on the bits and pieces you indulge in. Here too, you should know where to draw the line because not all the details of code are required for the demo.
Data science is about building predictive models and they require insights extracted from data. Feature engineering is exactly about this aspect. It is the ability to look at the data from a different perspective to draw the information required to solve the problem at hand. For example, if the data of an FMCG company is provided to find out the product which has potential for cross-cultural adaptation – as in the case of Indian spices — it would be more meaningful to look at geographical distribution rather than digging into temporal data.
Ensemble modelling is nothing but mixing data from different models to improve the stability and predictive capacity of the existing machine learning model. It usually happens in different forms like boosting, stacking, blending and bagging. It is like seeking a second opinion from the existing models. 95% of people use ensemble models because it is the best way to improvise the existing models.
Not having a validation framework is like punching in the dark. To ensure your model is robust and reliable, it should be tested against various subsets, trains or sets of data. Many people just dump the data into their models and let it validate for itself against a few benchmarks. It will only make your model vulnerable to overfitting, leakage and other evaluation problems. Apart from having a robust validation model, also avoid relying on a public dashboard as it can result in a drop in private ranking significantly.
Building a data science project involves coding from the scratch or using codes from different sources. If any member of the team doesn't get access to the right code at right time, it can spell disaster for the project. And it happens most of the time because the team keeps the code unorganized. The moment the hackathon starts, quickly the room turns into an adrenaline chamber, and you will forget to keep things sorted. You may write rough code, copy-paste from an earlier project, or access it from stack overflow, making sure it is accessible is very crucial because every minute counts in a hackathon.
Usually, data model building is a linear process that involves data cleaning, EDA, feature engineering, model building and evaluation, a framework that most programmers follow religiously. However, at times the process may not yield the desired result and in such circumstances, it is advised to look back into the algorithm or the parameters you choose. For example, a data set chosen for a product might show gender disparity, all because the gender variable is ignored.
Communication, both internal and external, is very critical, especially in a fast-paced and process-driven environment. Collaborate with online professionals to discuss the problems whenever you hit a roadblock. There may be a missing detail here or a sub-programme that needs improvisation there, there are discussion forums like Reddit and Kaggle. As a team, maintain the camaraderie of normal communication, irrespective of the tense circumstances. Be it the last-minute change or inability to do a particular task, keep the other members of the team updated.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.