私はこのようなツールを自分で探しましたが、それを見つけたことはありません。私は通常、それを行うためのスクリプトを書くだけです。ここであなたに使用であるかもしれないいくつかの制限のサンプルです:
import concurrent.futures
from collections import Counter
tokens = []
for _ in range(10):
tokens.extend(['lazy', 'old', 'fart', 'lying', 'on', 'the', 'bed'])
def cooccurrances(idx, tokens, window_size):
# beware this will backfire if you feed it large files (token lists)
window = tokens[idx:idx+window_size]
first_token = window.pop(0)
for second_token in window:
yield first_token, second_token
def harvest_cooccurrances(tokens, window_size=3, n_workers=5):
l = len(tokens)
harvest = []
with concurrent.futures.ThreadPoolExecutor(max_workers=n_workers) as executor:
future_cooccurrances = {
executor.submit(cooccurrances, idx, tokens, window_size): idx
for idx
in range(l)
}
for future in concurrent.futures.as_completed(future_cooccurrances):
try:
harvest.extend(future.result())
except Exception as exc:
# you may want to add some logging here
continue
return harvest
def count(harvest):
return [
(first_word, second_word, count)
for (first_word, second_word), count
in Counter(harvest).items()
]
harvest = harvest_cooccurrances(tokens, 3, 5)
counts = count(harvest)
print(counts)
あなただけのコードを実行する場合は、これを参照する必要があります
[('lazy', 'old', 10),
('lazy', 'fart', 10),
('fart', 'lying', 10),
('fart', 'on', 10),
('lying', 'on', 10),
('lying', 'the', 10),
('on', 'the', 10),
('on', 'bed', 10),
('old', 'fart', 10),
('old', 'lying', 10),
('the', 'bed', 10),
('the', 'lazy', 9),
('bed', 'lazy', 9),
('bed', 'old', 9)]
制限:
WILD GUESSあなたはまだ(私の経験では)少し不安定である必要があります。