9. Text Processing (3)

컴퓨터 과학!/Algorithms2004. 11. 7. 01:38

9. Text Processing (3)

9.3 Text Compression
Text compression은 low-bandwidth channel 이나 fixed-capacity storage device 등에서 중요한 문제이다. ASCII(16bits)나 Unicode system(16bits)에서는 fixed length binary string을 사용하는데 반해, Huffman code는 frequency가 클 수록 더 짧은 길이의 code를 부여하는 variable length encoding 방법을 사용하여 space를 줄인다.
이런 variable length code를 사용하기 위해서는, 이것은 prefix code여야한다. Prefix code란, 'no code word in our encoding is a prefix of another code word in its encoding'이다. 사실, 뜻은 prefix-free code이다. 이 때문에 code word 길이는 길어질 수 있지만, 그래도 fixed length를 사용하는 것 보다는 공간이 절약된다. 특히나, 인간이 사용하는 언어와 같이 frequency들 간의 차이가 큰 경우 더욱 효율적이다.

1) The Huffman Coding Algorithm
⊙ Huffman tree 만들기
root를 single-binary tree로 시작하여, frequency가 낮은 것부터 merge한다.

Huffman code algorithm and its construction

d개의 distinct char가 있고, text 길이가 n이라 하면, 두 binary tree를 merge하는 while-loop 동작은 d-1번 수행되고, priority queue 유지하는데 O(logd)걸리므로, 이 Huffman tree를 만드는 데 걸리는 시간은 O(n + dlogd)가 된다.

2)The Greedy Method Revisited
Huffman's Algorithm은 일종의 Greedy method이다. ...

Posted by 스니

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

스니스니 닷넷

9. Text Processing (3)

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바