Copyright © Data Valley 2018

  • fb-01
  • linkedin-01
  • wechat-01
  • email-01
 
 
ABOUT
What is BigDatathon?

RADICA works with different universities each time to present BigDatathon - a competition focusing on the application of big data, data analytics, and data science. The competition gathers real business challenges and data from actual companies for you to solve

It's NOT only for PolyU! We welcome ANY students, graduates from different universities, startups, working professionals to join the challenge!

Hurry and gather your team of 2-5 people and register! RADICA, UBS and PolyU's challenge topics are waiting for you! ​
*Please note that each team must choose ONE challenge from the list, and cannot propose their own topics. The same topic can be worked by more than one team. 


 

 

 

 
WINNERS (2018)
Your time to shine! Register now and Join us for 2018's Challenge! 

​Awards

Team 

UBS Champion Award

1. Charles Wong
2. Lam Fong Pui
3. Kaneko Shoyu
4. Waqas Ali
5. Sage Foh 

RADICA Champion Award

And

Best Data Hunter Award

1. Wai Ho,TSUI
2. Wai Pan, Yik
3. Lee, CHAN
4. Wing Lam, Leung

POLYU Champion Award

1. Ka Yu, Chan
2. Tsz Hin, Chan
3. Cheuk Yin, Wong

Most Innovative Award

1. LUK, WING SAN

2. KWOK Chun Ho Andy

3. TSANG CHIU SING

Best Presentation Award

1. Ho Yeung, Wong

2. Siu Chun, Lo

 
TOPICS 2018
2018 Topics will keep updating! Stay tuned!

Company

Topic

UBS provides financial advice and solutions to wealthy, institutional and corporate clients worldwide, as well as private clients in Switzerland. UBS' strategy is UBS centred on their leading global wealth management business and their premier universal bank in Switzerland, enhanced by Asset Management and the Investment Bank. 

 

Challenge 1 

 

There is a lot of content generated from social media channels such as Weibo in China every day. For marketing purpose, UBS would like to know what are the content that the netizens feel interested most in different cities of China. With the results, UBS could focus to host more relevant client outreach events based on their interests.

 

We now provide you over 6,000 Weibo posts (unstructured data) as extracted from 8 KOLs between 1st Feb and 30th Apr 2018. These posts should be grouped into different categories of interest first for easier analysis. You are now required to:

 

A. Classify these Weibo posts into 13 categories of interest according to the below category ID:

 

0. Stock

1. Bond

2. Oil

3. Gold

4. Real Estate

5. Chinese Art (painting/ drawing/ calligraphy)

6. Western Art (painting/ drawing/ calligraphy)

7. Jewellery

8. Artefacts

9. Golf

10. Car

11. Overseas Education

12. Young Children Education

 

B) Based on the social network’s likes, retweets, comments and the city of the commentators of each post, please find out the ranking of the 13 interest categories (a) within each city and (b) across the cities according to their popularity. The most popular interest should be put at first.

 

C) Besides, you must also demonstrate your idea how to further value-add to UBS business for a marketing solution based on the above results.

 

At the end of the competition, please submit: 

 

  1. Your programming source code

  2. Your classification result in a txt file (please download the template here)

  3. Your ranking results in a single Excel file (please download the template here)

  4. A PowerPoint file to clearly state the methodology to run the algorithm and your marketing ideas

  5. List out the External Library (if it is needed)

 

Note: Data access method will be disclosed during the competition.

RADICA is a leading Big Data Solution Provider that offers secured, easy-to-use and quality data-driven solutions to the early technology adopters in the new era of marketing. With headquarters at Hong Kong, RADICA helps multi-national brands and enterprises to discover any hidden opportunities by connecting data across different sources with machine-learning and big data technology, helping them further optimize their database value.

 

 

Challenge 2 

 

As consumer demands evolve towards digitally enabled experiences, more data is available on the Internet to the retail sector for the data analysis. Retails eagerly want to have more external data (open data) to map with their internal data, in the hope to gain further business insights and understand the trends. 

 

You will be provided a list of hyperlinks that represent different data sources in the Internet. All links were assigned 3 Tiers of Difficulty, subject to the difficulty level of the crawler programs (i.e. Tier 1 = Least Difficult / Tier 3 = Most Difficult). You are required to crawl as much as useful open data for the retail industry, and demonstrate your idea how those external data can be benefit to retail industry for a profitable solution in the future.

 

Participation Criteria:

i)      Each dataset should have at least 50 rows. If the dataset originally has less than 50 rows, please crawl all rows. 

ii)      Each dataset should have at least 2 columns.

iii)     You need to crawl at least 5 datasets within 24 hours.

iv)     All the column features should be distinctive.

 

* 1 of the datasets we have given it additional “UNICORN Bonus” with extremely high crawling difficulty. If you managed to hunt it, your total score will be increased by 25%!

 

At the end of the competition, please submit: 

  1. Your programming source code

  2. Your crawling results in a single Excel file (please download the template here)

  3. A Powerpoint file to clearly state the methodology to run each crawler and your marketing ideas

  4. List out the External Library (if it is needed)

 

Note: The list of hyperlinks will be disclosed during the competition.

PolyU is a government-funded tertiary institution in Hong Kong with a total student headcount of about 29,000 students, including full-time and part-time students. It is fully committed to academic excellence in a professional context with a view to designing, developing and delivering application-oriented education and training programmes. It also engages in a broad portfolio of research and scholarly activities in a focused manner, with special emphasis on applied research.

 

Challenge 3
 

Admission to university is an important milestone in most people’s life. While there exist many different forms of data related to university admission, carrying out deep analytics on such a variety of data will lead to valuable information to high school students and other stakeholders.

 

This challenge consists of two parts: (i) predictive analytics and (ii) descriptive analytics. In predictive analytics, your task is to predict the maximum and minimum HKDSE scores obtained by 2012-2017 admittees to three degree programmes of PolyU, namely, Computing, Business and Nursing. As the history of HKDSE is short and the publicly available data is limited, the challenge is about how to make use of auxiliary data like other related degree programmes’ data to enhance its accuracy. You will be provided with some sample data for this part and the incorporation of more auxiliary data is welcome. Your work will be beneficial for PolyU to better understand the attractiveness of different programmes.

 

In descriptive analytics, you are asked to crawl useful and relevant data from websites and online forums, for which a recommended list will be provided, and then mine/analyse the data to discover insights about PolyU’s programmes. While the type of analysis is open, you may consider carrying out a proper clustering of the available JUPAS degree programmes based on their admission requirements and historical application and admission data (as in predictive analytics). It is expected that those programmes within a cluster are closely related to each other in admitting quality students and universities can benefit a lot from such information for future programme design and marketing strategy.

 

At the end of the competition, please submit:

1.    Your programming source code

2.    Your results in predictive analytics (see this sample submission file)

3.    Your results in descriptive analytics, including the crawled data and possibly the cluster information (see the sample submission file)

4.    A document (i.e. a powerpoint presentation file) highlighting your formulation, methodology, findings and tools used.

 

Note: Data access method will be disclosed during the competition.

 

 

 

 

 
PRIZES
2018 Datathon
CASH PRIZES

Awards

Prize Details

UBS Champion Award

HK$10,000

RADICA  Champion Award

HK$10,000

POLYU Champion Award

HK$10,000

Best Data Hunter Award

HK$3,000

Most Innovative Award

HK$2,500

Best Presentation Award

HK$2,000

Footnotes
* Outstanding performance team could have working opportunity/internship offered by sponsors
** Awards distribution subject to changes 
Post-Datathon Event
Participants with good performance (not limited to the winning team members) will be invited to our post-Datathon networking event with the representatives of organizers and sponsors. It would be a great opportunity for you to network with great companies and be one step closer to your dream job or to kick-start your startup idea!
 
SCHEDULE
26 MAY (SAT)
27 MAY (SUN)

Time

Rundown

Time

Rundown

09:30 - 10:00

Registration

10:00 - 11:50

Opening Ceremony
Venue: Poly U, N003

11:50 - 12:30

Team Formation Session (Optional)

11:50 - 15:00

Teams are finalized and submitted via Submission form

13:15

Datathon Starts

Venue: Poly U,
N102 & N103

Team formation deadline: 15:00
(2-5 ppl per team) 

13:15 - 17:00

Mentoring session available 

19:00 ~ 23:59

Dinner & Overnight development period*

00:00 ~ 13:15

Overnight/ Morning development period*

13:15

Datathon ends

13:30 - 14:30 

  • First-round Judging

  • Equipment testing **

  • Finalists Selection Process

14:30 

All participants MUST attend Finalists Presentation
Venue: Poly U, N003

14:30

Finalists Presentation starts 
Each team has 7 mins. to present + 3 mins. Q & A

17:15 - 18:00

Winner announcement 
Award and Closing Ceremony

Footnotes
* Team can decide to go back home or work overnight. The organizing committee will provide a minimal support during the overnight period
** Each team can send 
1-2 representative(s) to the presentation venue. The organizing committee will help him/her test the equipment. Please note that it is not a rehearsal period. 
 
 
 
JUDGES AND JUDGING CRITERIA​
JUDGING CRITERIA
& Suggested Presentation Format

 

Judging Criteria for UBS Challenge:

 

  • 30%           Classification result based on F1 Score

  • 30%           Creativity of ranking methodology and visualize the findings nicely

  • 20%           To demonstrate your capability to add value to UBS business for a marketing solution based on the findings

  • 20%           Communication skills to present the findings and be able to defend during Q&A session

 

BONUS Points:

* Bonus 10 scores: Adoption of APIs/ data from the data portal of Data Studio @ Science Park

* Bonus 20 scores: Adoption of APIs, software tools, cloud computing resources (e.g. Azure), frameworks provided by Microsoft

Presentation format for semi-final:

1 minute       Run program, show the source code, the classification and ranking results

                       (based on the result template given) 

2 minutes     To demonstrate any creativity of the ranking methodology and data

                        visualization of the findings  

2 minutes     To demonstrate how you can further optimize the UBS business value

                       based on your findings 

 

 

 

Judging Criteria for RADICA Challenge:

 

 

  • 30%             Data crawling capability to harvest as much as quality and fresh data                                
                                 Dataset of Tier 1 = 1 score

                                    Dataset of Tier 2 = 2 scores

                                    Dataset of Tier 3 = 3 scores

 

   Score = [No. of Datasets of Tier 1  X 1 score]  + [No. of Datasets of Tier 2 X 2 scores] + [No. of Datasets of Tier 3 X 3 scores]  

 

            Additional Scores:

  • 2 scores for the team having the maximum number of column crawled among Tier 1 datasets

  • 4 scores for the team having the maximum number of column crawled among Tier 2 datasets

  • 6 scores for the team having the maximum number of column crawled among Tier 3 datasets

 

  • 30%        To demonstrate your crawling approach and methodology to improve the crawler efficiency (e.g. how        to overcome log-in system, access control system or rule out the noise, etc.)

  • 20%        To demonstrate your capability to add value to the retail business for

        profitable solution based on the findings

  • 20%        Communication skills to present the findings and be able to defend during Q  and A session

 

BONUS Points:

* UNICORN Bonus: Total score will be increased by 25% if managed to crawl the specific dataset with UNICORN bonus

* Bonus 10 scores: Adoption of APIs/ data from the data portal of Data Studio @ Science Park

* Bonus 20 scores: Adoption of APIs, software tools, cloud computing resources (e.g. Azure), frameworks provided by Microsoft

 

Presentation format for semi-final:

1 minute       Run the program, show the source code and the crawling results 

2 minutes     To demonstrate your data crawling methodology and any techniques to

improve the efficiency of crawlers.

2 minutes     To demonstrate how you can further optimize the retail business value   

                      based on your findings 

 

 

 

 

 

Judging Criteria for PolyU Challenge:

 

  • 30%           Predicted result based on mean squared error w.r.t. ground truth (predictive analytics part)

  • 30%           Quality of data crawling (in terms of variety and diversity), feature engineering (in terms of representativeness), and methodology used (in terms of effectiveness) (descriptive analytics part)

  • 20%           Demonstration of capability and innovation to provide insights to university admission, programme design and marketing strategy.

  • 20%           Communication skills to present the findings and the ability to explain and/or defend during Q&A session
     

BONUS:

* Bonus 10 scores: Adoption of APIs/data from the data portal of Data Studio @ Science Park

* Bonus 20 scores: Adoption of APIs, software tools, cloud computing resources (e.g. Azure), frameworks provided by Microsoft

 

 

Presentation format for semi-final:

2 minutes     Demonstration of programme running, source code, predictive and descriptive results 
2 minutes     Articulation of ideas, approaches, and findings, and self-evaluation of limitations in both predictive and descriptive parts 
1 minutes     Comment on how you can further optimize the solutions and generate deeper insight for this challenge in the future 

 

 

 

 

 

 

The format for Final Presentation (on stage at Room N003): 

Each team has 7 mins. to present + 3 mins. Q & A

 
RULES AND REGULATIONS
 
  • Participants required to bring their own computers/laptops or any technical equipment required for this event 

  • In addition to the given datasets, teams are free to use any open data or crawl data from the Internet

  • A team must consist of at least 2 and no more than 5 participants. Participants MUST register and show up physically in Day 1 morning

  • Each challenge will be assigned a dedicated mentor to help you understand the problem and provide guidance during the Datathon prototyping stage

  • All teams must begin coding at the same time. All development work must be done within the development period stated in the event schedule

  • Each team should work on exactly one topic selected from the list. For this competition, teams can't propose their own topics

  • PowerPoints (or equivalent) presentation is allowed

  • The organizing committee reserves the rights to disqualify the participants for late, absent, or improper dressing and behaviour during Datathon

  • During the event, participants are responsible for their own personal belonging, event organiser will not take responsibilities for any lost properties

  • An NDA would be required to be agreed upon submitting team information  

  • Campus Control Centre
    Tel no.: 2766 7666 (24 hours)
    Tel no.: 2766 7999 (For emergency)

CO-ORGANIZERS
SPONSORS
PLATINUM SPONSOR
Gold SPONSORS
SILVER SPONSORS
SUPPORTING ORGANIZATIONS
Copyright © RADICA BigDatathon 2018