Talk:Hash table
This is the talk page for discussing improvements to the Hash table article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1, 2, 3Auto-archiving period: 12 months |
This level-5 vital article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
Unfinished question
[edit]Problem 1: The Hash Table will be used for storage of student records. You may assume the maximum number of Items will not exceed 240. An item corresponds to a student record. A key is taken to be an ASCII character string of maximum length 32 containing name of the student. The data is taken to be 9-digit id and the name of the state of residence of the student in India. A list node contains an Item and a reference to the next node. You will need to implement the classes Item and the ListNode with construct and appropriate data access operations. 2. Write a main() program to test your class HashTable. 3. Input: The name of the input file, containing data items (student records) and data operations in the format given below, and the name of the output file, will be passed to your program on the command line or interactively. Data Items: student-id, student-name, state-of-residence Example. 200412018, Swati Mishra, Uttar Pradesh one per line. Data Operations: <operation> <item> The <operation> is s,i, or d for search, insert and delete, respectively. The item will have fields: student-name, student-id, and state-of-residence. The fields of an item will be separated by commas, and operation will be separated from the item by a colon ”:” and a space. Example. s: 200211001, Akash Gokhale, Maharashtra one per line. The data items will be separated from the data operations by a blank line. 4. Output: Your program is to read the input file and populate the Hash Table with student records by repeated Insert() operations. And then print to the output file, the size of the Hash Table, and the size of each linked list in the Hash Table. Then, it will continue to read each line and execute appropriate data operations. Following each Insert()/Delete() operation, it will output the size of the Hash Table and the size of each of the linked list, and for each Search operation, it will output
-- 220.225.53.35 09:18, 11 October 2006
Lead
[edit]The lead says: Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions...Such collisions must be accommodated in some way. .
We don't live in an ideal world, so starting out with a statement about ideal is supercilious. The implication is that if a hash function is 'imperfect', like an assembly line making imperfect parts, why don't we fix it so it's 'perfect'? An imperfect hash function might cause hash collisions, but the sense of this statement is vague, and beggars the question, why don't we build one that 'might not' cause hash collisions? Easy enough to postulate, eh? We might lead a novice into actually attempting to do that. I wish him luck.
How about this in place of the whole paragraph: Hash functions are designed to be efficient to compute and minimize collisions, duplicate hash codes resulting from distinct keys. 19 words versus 48. And it's very informative. The lead is not the place to go into what happens in a collision: we may assume, and indeed we find, a whole section talking about just that. Hmmm... or we could add, Hash functions are accompanied by an adjunct collision-resolution method. Now I think we've got a substantial digest. Scholarly diction is an art. Sbalfour (talk) 21:47, 24 September 2019 (UTC)
- "Perfect hash function" is a term of art - http://en.wiki.x.io/wiki/Perfect_hash_function. So the characterisation of non-injective hash functions as "imperfect" here isn't meant as a slight against them or a value judgement on their quality or usefulness, it's just a jargon way of saying that they have collisions.
- I think I nonetheless agree with you that the passage feels a bit weird to me, for a few reasons:
- It doesn't link to Perfect hash function, and it's non-obvious what "imperfect" means here unless you either happen to already know the jargon or you get a spidey sense that it's a jargon term and google it
- It's kind of tautological. "most hash table designs employ an imperfect hash function, which might cause hash collisions" - yes, that's literally what being imperfect means.
- The "ideal" scenario is simply impossible in the extremely common scenario that there are infinitely many possible keys (e.g. keys are strings of arbitrary length). You cannot have a perfect hash function with an infinite domain.
- ExplodingCabbage (talk) 17:33, 10 July 2024 (UTC)
Let's try again. The lead says, In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason,.... What reason? It's a dangling participle, with look-thru ambiguity, it's not attached to anything. If we turn the sentence on its head, we get: Hash tables are more efficient in many situations than other things because <reason>. We need to state the reason, then we can say, For that reason,... How about Hash tables provide nearly constant access time and compact storage for large numbers of records, that can't be efficiently stored or accessed by other methods. There's ordered and unordered lists, linked lists, double-linked lists, circularly-linked lists, tries, heaps, stacks, queues, and etc, too numerous to mention in the lead, so don't try. If it matters - and it's worth a paragraph or maybe a sub-sub-section - there we distinguish the situations where we might use, other methods. Sbalfour (talk) 22:19, 24 September 2019 (UTC)
- I don't understand this second point of yours. It's not a dangling participle, it's unambiguous that the reason being talked about is that hash tables are more efficient than other stuff, and I have no idea what "look-thru ambiguity" / "look-through ambiguity" is meant to mean because I find almost no Google hits for either term (indeed this Talk page is the ONLY hit for "look-thru ambiguity"). ExplodingCabbage (talk) 17:35, 10 July 2024 (UTC)
Variable-sized data and chained hashing
[edit]Would this be a good citation for the better performance of variable-sized data in chaining hash tables? https://www.researchgate.net/profile/Haval_Ahmed2/post/Any_ideas_about_hash_table_implementation/attachment/59d6430bc49f478072eabc10/AS:273806436831232@1442291949107/download/In-memory+hash+tables+for+accumulating+text+vocabularies.pdf ?
First sentence of section "2.3. Hash tables" seems to be useful, but it is not backed up in the article itself.
Introduction is incorrect
[edit]Introduction sentence and the claim that Hash tables are implemented as associative arrays is incorrect. Even the linked reference says that associative arrays are implemented as hash tables, not the other way around. Currently associative array and hash table articles are in an infinite loop as each claims they are implemented using the other. 2001:7D0:87E3:100:C528:86D0:1049:5205 (talk) 06:18, 27 October 2023 (UTC)
- The article currently says, in the introductory sentence:
In computing, a hash table, also known as a hash map, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values.
- which says that a hash table implements an associative array, i.e. that an associative array is implemented as a hash table not the other way around. Guy Harris (talk) 06:44, 14 November 2023 (UTC)
Hash digest
[edit]Where is the hash digest ...
This explanation in section Hashing by division
is unclear because the term hash digest is not explained and does not appear in the equation above it. (Also, it should probably not be capitalized because it is not a complete sentence.)
In the following section, Hashing by multiplication
, it is the other way around: appears in the equation but is not explained. Gnib Bnil (talk) 14:55, 5 June 2024 (UTC)