TechNOTE

Aligning a corpus with Fastalign 본문

NLP

Aligning a corpus with Fastalign

JU1234 2021. 10. 27. 15:04

1. Install Fastalign 

follow this github 

https://github.com/clab/fast_align.git

 

GitHub - clab/fast_align: Simple, fast unsupervised word aligner

Simple, fast unsupervised word aligner. Contribute to clab/fast_align development by creating an account on GitHub.

github.com

 

2. Prepare tokenized data 

for my case, source.tok.txt target.tok.txt 

3. Combine tokenized data 

 paste {src}  {tgt}  | sed 's/ *\t */ ||| /g' > {src}-{tgt}

4. Run fastalign 

fast_align -d -v -o -i  {src}-{tgt} > align.txt

 

반응형

'NLP' 카테고리의 다른 글

Install KoNLP & Mecab  (0) 2022.04.12
[논문리뷰] FlowSeq  (0) 2021.03.24
[논문리뷰] Sparse Transformer  (0) 2020.11.20
Comments