CS2040S编程 写作、 辅导Algorithms程序

” CS2040S编程 写作、 辅导Algorithms程序CS2040S: Data Structures and AlgorithmsProblem Set 4Due: TBACollaboration Policy. You are encouraged to work with other students on solving these problems.However, you must write up your solution by yourself. We may, randomly, ask youquestions about your solution, and if you cannot answer them, we will assume you have cheated.In addition, when you write up your solution, you must list the names of every collaborator, thatis, every other person that you talked to about the problem (even if you only discussed it briefly).You can do so by leaving a comment at the start of your .java file. Any deviation from this policywill be considered cheating, and will be punished severely, including referral to the NUS Board ofDiscipline.1Problem 4. (Make it Smaller)The goal of this problem is compression: take a file and make it smaller. We will take every byteof the input file and re-encode it so that the entire file is smaller.By default, each ASCII character is stored using 8-bits. But this is quite inefficient: some characters(like e) appear much more frequently. Others appear much less commonly. We could save spaceby encoding an e using only 3 bits, while using 12 bits to store every z in the document. To bemore precise, imagine a document has W characters in it, and character ci appears wi times in thedocument. Then the ideal thing to do is store character ci using log(W/wi) bits.1In this problem, We will use a binary tree to generate the codewords.Each leaf in the tree will represent a symbol in the input file. The codeword is found by looking atthe path from the root to the leaf containing the symbol: start at the root, and every time you goleft, add a 0 to the codeword; every time you right, add a 1 to the codeword. When you reachthe leaf, return the codeword itself.A key property of this type of code is that they are prefix-free codes: if c is the codeword for somesymbol, then c is not a prefix for any other symbol (e.g. {00, 01} is prefix-free but {00, 001} is notprefix-free because 00 is a codeword and a prefix of 001). This is very useful because it means that1The reason why that is Optimal is related to the idea of entropy.2if we ever see the codeword c, we know exactly how to decode it (without worrying that it is partof some longer codeword).In this problem, we will provide you with most of the mechanism for encoding and decoding files.The only part you have to do is the part related to tree manipulation: build the tree, and implementtwo query methods: one that translates symbols to codewords, and one that translates codewordsto symbols.There are two versions of this Problem: the easy version and the harder version. You are requiredto do one of them (and a small bonus for doing the harder one). In the easier version, you can ignore the frequency with which each symbol appears. In thiscase, you only have to build a balanced tree. However, the compression performance will notbe as good, Since you are not getting any benefit from prioritizing repeated characters. In the harder version, you will weight the tree according to the frequency that each itemsshows up. This will get you close to an optimal encoding.The easier version should be implemented in the UniformTree class, while the harder version shouldbe implemented in the WeightedTree class.Problem 4.a. Your first task is to implement the buildTree routine. In both cases the goal isto build a tree where all the symbols are stored at the leaves (i.e., the internal nodes do not containany symbols, just guides for searching). The tree should be in BST-order.It is important that the Symbols are at the leaves, since that ensures that no codeword is a prefixof another codeword!Easier version. The buildTree method is called in the constructor. An array of bytes is passedin, each representing a unique symbol in the input file. When the buildTree routine finishes,the member variable root should be the root of a balanced binary tree. The tree should be inBST-order (with respect to the symbols) so that we can search the tree for a symbol efficiently. Wehave included a TreeNode class as part of the UniformNode class; you may want to look at that.(You should not need to modify it.)You can assume that there is 2n unique characters, making it possible to build a full binary searchtree.Try to make your tree construction algorithm as efficient as possible, e.g., linear time.Harder version. In this case, the buildTree method takes as input an array of pairs: thesymbols and their weight (i.e., the number of times it appears in the input document). Again, youwant to build a tree that is in BST-order, so that it can be efficiently searched. The Pair typesupports the Comparable interface, so you can sort it using system sort (java.utils.Arrays.sort)and it will sort by the symbol.3However, now the tree should be weight-balanced rather than simply balanced: the sum of theweight on the left side should be about equal to the sum of the weight on the right side.Of course, exact balance cannot always be achieved because weights are discrete. Thus the goal isas follows: if W is the weight of some tree node v, ensure that neither the weight of vs left childnor that of vs right child exceeds 2W/3.There is one exception to this rule: if there is one item x that is of size at least W/3 all by itself,then the side containing that item may have more than 2W/3 weight. However, at the very nextlevel of the tree, x Should become a leaf.In the above example, the node on the right is a weight ratio that is more than 2W/3, and is aintermediate node, so it is not allowed. However, in the second example, although the weight is 0.1and 0.4, this is acceptable because the 0.4 is a single node on its own.If you do the construction properly, an item with weight wi should be at a leaf of depth O(log(W/wi)),(i.e., an optimal depth), where W is the total weight of the tree,Try to make your tree construction algorithm as efficient as possible, e.g., O(n log n) time.Problem 4.b. Your second task is to implement a query to find a codeword. The queryCodemethod takes a key as an input (in the form of a byte) and returns the codeword, in the form of aboolean array. (We will think of the codeword as a boolean array because we will write it to thefile as a sequence of bits.)You should be able to implement this by walking the tree from the root to the leaf of the appropriatesymbol. If the symbol is not found at a leaf, then return null.For the codeword returned, if code is the boolean array, then code[0] should be the first bit ofthe codeword (i.e., the step taken from the root), while code[1] should be the second bit of thecodeword, etc.Implement the query in an efficient manner.4Problem 4.c. Your third task is to implement a query to find a symbol, given a codeword.The query method takes a boolean array code and an integer bits as an input. The methodwill then output a key (in the form of a byte). The code array is only valid in the range fromcode[0..bits-1]. (The rest of the array is declared in advance to act as a buffer, since we do notknow how big the Codeword will be in advance.)The goal, then, is to lookup the codeword code[0..bits-1] in the tree and if the result is a leaf,return the symbol found. Otherwise, return null.For the codeword, if code is the boolean array, then code[0] should be the first bit of the codeword(i.e., the step taken from the root), while code[1] should be the second bit of the codeword, etc.Implement the query in an efficient manner.Once you have implemented the above three methods, the compressor and decompressor classesshould work. Simply update the input and output files in the main routine of each of them and youshould be able to Compress and decompress accordingly. How good a compression rate do you get?(For text files with 8-bit ASCII text, I would expect you can save at least 3 bits per character.)Problem 4.d. (Optional.) You can get better compression by being a little more clever. Thereare several things you might try. A natural thing to try is to build a Huffman Tree. How muchbetter performance do You get? Another idea is that you might do better in text documents if youlooked at words instead of characters: what if you map each word in the text to a leaf in the tree?Do you get better compression? What is the Best compression you can get for, say, Hamlet?如有需要,请加QQ:99515681 或WX:codehelp

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导