One Point Solution

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 15 September 2013

Attempts to organize my topcoder folder

Posted on 15:37 by Unknown

So when I migrated from KawigiEdit to Greed, the biggest issue was that thanks to KawigiEdit, all my topcoder source code files are in a single folder. Greed on the other hand, by default organizes them into sub-folders with the contest name. If I wanted to use Greed's organization powers, I would first need to organize all the files created by KawigiEdit. Move all the source code to their respective contest folders.

As it turns out, my folder contained code for problems from 360 different topcoder contests! So I needed an automatic method. No other choice but to make a python script. I refurbished some of the work I put in the handle tag script to make it search for problem names in the problem archive and then parse the results to get the contest name, then do the move.

As it turns out, it wasn't an incredibly trivial problem to solve:

  • Some problems were used in multiple contests, which one to choose?
  • Some of my problems couldn't be found in statistics (Round canceled, a bug, match was never added to statistic because it was special).
  • The results have a different format for TCHS matches and other contests.

There is even a problem I couldn't solve in anyway: For many tournament rounds, the contest name differs between practice room and problem archive (booo!).

But anyway, here is my script in case anyone needs it. It most likely only works in Unix-like OS, as it uses / for folder paths and commands like mv and mkdir. You are supposed to run it from the folder that contains all the problem codes.

Updated script to a windows friendly version thanks to Bjarki Ágúst Guðmundsson.


# Topcoder folder organizer
#
# err let us say it is released under the zlib/libpng license
# http://www.opensource.org/licenses/zlib-license
# (c) Victor Hugo Soliz Kuncar, 2013
#
import urllib, re, sys, os, time, string, glob

# Rudimentary proxy support
# For example:
# PROXIES = {'http': 'http://7.7.7.7:3128'}
PROXIES = None

# Will try all file names with these extensions, and assume they are ClassName.extension :
EXTENSIONS = [".cpp", ".java", ".cs", ".vb", ".py"]

BLACK_LIST = [ "template.cpp", "organize.py" ] #put any files that you are pretty sure are not problem source codes


#Special cases, problems that are not in the web site.
#Not by any chance an exhaustive list:
SPECIAL = { #NetEase rounds are not in statistics, let's put them all in the same folder:
'UnrepeatingNumbers ': 'NetEase',
'MaximumPalindromeScore' : 'NetEase',
'RobotKing' : 'NetEase',
#This qualification round was canceled and renamed 3A, not in stats:
'ChessTourney' :'TCO08 Qual 3A',
'PokerSquare' :'TCO08 Qual 3A',
'RandomNetwork':'TCO08 Qual 3A',
# Same with this one:
'DNAMatching' : "TCO'10 Qualifier 1A",
'Palindromize3' : "TCO'10 Qualifier 1A",
'MegadiamondHunt': "TCO'10 Qualifier 1A",
# SRM 377 is not in the stats, and I have no idea why:
'AlmostPrimeNumbers' : 'SRM 377',
'SquaresInsideLattice': 'SRM 377',
'GameOnAGraph' : 'SRM 377',
'AlienLanguage' : 'SRM 377',
# Member SRM 471 is gone from statistics, maybe it was cancelled, maybe
# it is one of those matches that suffer the fushar curse - a bug that
# removes matches by fushar from statistics.
'PrimeContainers' :'Member SRM 471',
'EllysPlaylists' :'Member SRM 471',
'Thirteen' :'Member SRM 471',
'PrimeSequences' :'Member SRM 471',
'ThirteenHard' :'Member SRM 471',
'ConstructPolyline' :'Member SRM 471',
}

#------------

REGEX_CONTEST = re.compile('([a-zA-Z0-9]+)\s*<.A>\s*<.TD>\s*<TD[^>]+>\s*<A\s+HREF="[^=]+[ce]=[a-zA-Z]+ound[a-zA-Z_]+verview&rd=[0-9]+"[^>]+>\s*(.+)\s*</A>')

CACHE = dict()

QUERY_COUNT = 0

def better(c1, c2):
# when a class has multiple contests, pick the better one.
# Parallel rounds:
p1 = 'arallel' in c1
p2 = 'arallel' in c2
if p1 and not p2:
return c2
elif p2 and not p1:
return c1

# some problems were used in special contests and also in SRMs, put priority in SRM:
s1 = (c1[0:3] == 'SRM')
s2 = (c2[0:3] == 'SRM')
if s1 and not s2:
return c1
elif s2 and not s1:
return c2

s1 = ('SRM' in c1)
s2 = ('SRM' in c2)
if s1 and not s2:
return c1
elif s2 and not s1:
return c2

return c1 #any one

def getProblemContest(handle):
global CACHE, QUERY_COUNT
if handle in CACHE:
return CACHE[handle]

#CACHE memoizes the following function:
def sub(handle):
if handle in SPECIAL:
return SPECIAL[handle]

# Wait 0.5 seconds between queries, and 2 seconds every 10 queries
# we are not DDoSing TC...
global QUERY_COUNT
if QUERY_COUNT == 10 :
QUERY_COUNT = 0
time.sleep(2.0)
else:
time.sleep(0.5)
QUERY_COUNT += 1

#download the problem archive search results
params = urllib.urlencode({'module': 'ProblemArchive', 'class': handle})
f = urllib.urlopen("http://community.topcoder.com/tc?%s" % params, proxies = PROXIES)

html = f.read()

# extract rating (or at least try)
m = REGEX_CONTEST.findall(html)
if type(m) == list:
m = [x[1] for x in m if x[0] == handle]

# Some problems were used in test SRMs, with a '2' added to their class name:
if m == None or len(m) < 1:
if len(handle) >= 1 and handle[-1] == '2':
return getProblemContest(handle[:-1])
return None
return reduce( better, m)

CACHE[handle] = sub(handle)
return CACHE[handle]


def inFolder():
for f in os.listdir('.'):
if os.path.isfile(f) and f not in BLACK_LIST:
c, ext = os.path.splitext(f)
if ext in EXTENSIONS:
s = getProblemContest(c)
if s != None:
s = s.replace('/','-')
if not os.path.isdir(s):
print ('mkdir "./%s"' % s)
os.mkdir(s)

print ('mv %s "./%s/"' % (f, s))
os.rename(f, os.path.join(s, f))
else:
print ('%s: contest not found.' % c)

inFolder()
Email ThisBlogThis!Share to XShare to Facebook
Posted in snippet, topcoder | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • TopCoder SRM 557 - finally
    SRM 557 Explanation for division 1 Easy and match recap. Explanations for div2 easy and div2 medium. It feels like it has been ages since t...
  • SRM 589 Editorial
    I have finished writing the editorial for TopCoder SRM 589: http://apps.topcoder.com/wiki/display/tc/SRM+589 . As you most likely noticed. L...
  • SRM 590 recap and editorial
    Another week another Topcoder match. Not a great day. I had a bad flu and still do. Div1 500: The one with Xor Given a list of cards with nu...
  • SRM 546: relief
    I figured I should post something about this SRM. I've been very busy these weeks because the semester is ending and I tried to win a t-...
  • SRM 601 editorial (minus div1 hard)
    It is up: http://apps.topcoder.com/wiki/display/tc/SRM+601 This was a very dry editorial to write. All problems were mathy ad hoc or complex...
  • Member SRM 505: Part 1
    So, let me explain a couple of problems from a Topcoder Member SRM that I wrote and never got an editorial. BTW, it was the last member SRM....
  • SRM 533: Div1 500 MagicBoard explanation
    Finally solved it. It is a nice problem that is worth explaining in a post. You have a grid/board of at most 50x50 cells. Some cells contain...
  • SRM 554 div1 hard: TheBrickTowerHardDivOne
    Link to problem statement We got infinitely many bricks of dimensions 1x1x1 and C different colors. Count the number of towers of size 2x2...
  • SRM 526: The killing wait for results
    While I wait for results, here is my perspective on this algorithm contest. It began with issues, it had to be postponed 15 minutes. TC has ...
  • TopCoder SRM 570: CentaurCompany and CentaurCompanyDiv2
    Another 570 editorial update: http://apps.topcoder.com/wiki/display/tc/SRM+570 . This time for the division 2 hard and division 1 medium. My...

Categories

  • acm
  • algorithm
  • answers
  • arenaplugin
  • badday
  • behindthescenes
  • bugs
  • c++
  • censorship
  • codechef
  • codeforces
  • contests
  • crocchamp
  • editorial
  • editorial.srm
  • embarrassing
  • explanation
  • gcj2013
  • gmp
  • goodday
  • google
  • googlecodejam
  • greed
  • groklaw
  • health
  • html
  • httpseverywhere
  • implementation
  • ipsc
  • ispc
  • java
  • kawigiedit
  • kindagoodday
  • lamebook
  • languages
  • lego
  • listedlinks
  • marathon
  • nasa
  • offtopic
  • ouch
  • postmortem
  • postportem
  • practical
  • probably_not_a_good_tip
  • problemsetting
  • programming
  • python
  • quora
  • rant
  • recap
  • slightlygoodday
  • snippet
  • srm
  • stl
  • strategy
  • swerc
  • tco
  • tco12
  • tco13
  • tco2012
  • tco2013
  • ternarysearch
  • topcoder
  • tricks
  • ubuntu
  • uva
  • vjass
  • vkcup
  • wc3
  • zinc

Blog Archive

  • ►  2014 (1)
    • ►  January (1)
  • ▼  2013 (141)
    • ►  December (14)
    • ►  November (8)
    • ►  October (13)
    • ▼  September (11)
      • SRM 592 Editorial and comments about the match
      • SRM 592 - sloow
      • Did I just spent all my saturday customizing my Te...
      • Updating c++/python TopCoder testers
      • SRM 591 Recap and editorial
      • Attempts to organize my topcoder folder
      • vexorian answers quora questions
      • Customizing the Topcoder greed plugin
      • Getting started with TopCoder Greed plugin
      • KawgiEdit-pfa with c++ code cleaner and Greed
      • SRM 590 recap and editorial
    • ►  August (14)
    • ►  July (15)
    • ►  June (13)
    • ►  May (13)
    • ►  April (12)
    • ►  March (11)
    • ►  February (11)
    • ►  January (6)
  • ►  2012 (94)
    • ►  December (5)
    • ►  October (6)
    • ►  September (8)
    • ►  August (6)
    • ►  July (3)
    • ►  June (5)
    • ►  May (8)
    • ►  April (10)
    • ►  March (20)
    • ►  February (16)
    • ►  January (7)
  • ►  2011 (51)
    • ►  December (7)
    • ►  November (12)
    • ►  October (5)
    • ►  September (1)
    • ►  August (3)
    • ►  July (4)
    • ►  June (3)
    • ►  May (7)
    • ►  April (3)
    • ►  March (2)
    • ►  February (1)
    • ►  January (3)
  • ►  2010 (9)
    • ►  December (4)
    • ►  October (1)
    • ►  June (1)
    • ►  May (1)
    • ►  January (2)
  • ►  2009 (1)
    • ►  December (1)
Powered by Blogger.

About Me

Unknown
View my complete profile