v.coderのコメントにフォローアップするために、私は彼(または彼女)がPythonで参照していたコードを書き直し、何が起こっているのかを説明しようとするコメントを追加しました。
def removeduplicates(s):
"""Original java implementation by
Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
in his/her answer
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
# python strings are immutable, so first converting the string to a list of integers,
# each integer representing the ascii value of the letter
# (hint: look up "ascii table" on the web)
L = [ord(char) for char in s]
# easiest solution is to use a set, but to use Druv Gairola's method...
# (hint, look up "bitmaps" on the web to learn more!)
bitmap = 0
#seen = set()
for index, char in enumerate(L):
# first check for duplicates:
# number of bits to shift left (the space is the "lowest"
# character on the ascii table, and 'char' here is the position
# of the current character in the ascii table. so if 'char' is
# a space, the shift length will be 0, if 'char' is '!', shift
# length will be 1, and so on. This naturally requires the
# integer to actually have as many "bit positions" as there are
# characters in the ascii table from the space to the ~,
# but python uses "very big integers" (BigNums? I am not really
# sure here..) - so that's probably going to be fine..
shift_length = char - ord(' ')
# make a new integer where only one bit is set;
# the bit position the character corresponds to
bit_position = 1 << shift_length
# if the same bit is already set [to 1] in the bitmap,
# the result of AND'ing the two integers together
# will be an integer where that only that exact bit is
# set - but that still means that the integer will be greater
# than zero. (assuming that the so-called "sign bit" of the
# integer doesn't get set. Again, I am not entirely sure about
# how python handles integers this big internally.. but it
# seems to work fine...)
bit_position_already_occupied = bitmap & bit_position > 0
if bit_position_already_occupied:
#if char in seen:
L[index] = 0
else:
# update the bitmap to indicate that this character
# is now seen.
# so, same procedure as above. first find the bit position
# this character represents...
bit_position = char - ord(' ')
# make an integer that has a single bit set:
# the bit that corresponds to the position of the character
integer = 1 << bit_position
# "add" the bit to the bitmap. The way we do this is that
# we OR the current bitmap with the integer that has the
# required bit set to 1. The result of OR'ing two integers
# is that all bits that are set to 1 in *either* of the two
# will be set to 1 in the result.
bitmap = bitmap | integer
#seen.add(char)
# finally, turn the list back to a string to be able to return it
# (again, just kind of a way to "get around" immutable python strings)
return ''.join(chr(i) for i in L if i != 0)
if __name__ == "__main__":
print(removeduplicates('aaaa'))
print(removeduplicates('aabcdee'))
print(removeduplicates('aabbccddeeefffff'))
print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][ ao8faeo~~~````%!)"%fakfzzqqfaklnz'))
「set」はユニークなアイテムを取得するのに適しています。しかし、たとえ秩序が重要であったとしても、 'sorted(set(s))'は最初の順序を返さないでしょう。例えば、 '' '.join(sort(set(' abcfbcdd '))) 'は' abcdf'を返しますが、最初の順序は 'abcfd'です。 – RomanPerekhrest
はい' Set 'は追加バッファとしてカウントされます。 @Dhruv Gairolaの答えのO(n)解を見てください。http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string –