{
   "name": "spambase",
   "title": "Spambase",
   "resources": [
      {
         "path": "spambase.arff",
         "pathType": "local",
         "name": "spambase",
         "format": "arff",
         "encoding": "ISO-8859-1"
      },
      {
         "path": "spambase.csv",
         "pathType": "local",
         "name": "spambase",
         "format": "csv",
         "mediatype": "text/csv",
         "encoding": "ISO-8859-1",
         "dialect": {
            "delimiter": ",",
            "quoteChar": "\""
         },
         "schema": {
            "fields": [
               {
                  "name": "word_freq_make",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_address",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_all",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "word_freq_3d",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_our",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_over",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_remove",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "word_freq_internet",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_order",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_mail",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_receive",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "word_freq_will",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_people",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_report",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_addresses",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_free",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "word_freq_business",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_email",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "word_freq_you",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_credit",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_your",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_font",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_000",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_money",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_hp",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_hpl",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_george",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_650",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_lab",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_labs",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_telnet",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_857",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_data",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_415",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_85",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_technology",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_1999",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_parts",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_pm",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_direct",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_cs",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_meeting",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_original",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_project",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_re",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_edu",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_table",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "word_freq_conference",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "char_freq_%3B",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "char_freq_%28",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "char_freq_%5B",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "char_freq_%21",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "char_freq_%24",
                  "type": "any",
                  "format": "default"
               },
               {
                  "name": "char_freq_%23",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "capital_run_length_average",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "capital_run_length_longest",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "capital_run_length_total",
                  "type": "number",
                  "format": "default"
               },
               {
                  "name": "class",
                  "type": "number",
                  "format": "default"
               }
            ],
            "missingValues": [
               ""
            ]
         }
      }
   ],
   "readme": "The resources for this dataset can be found at https://www.openml.org/d/44\n\nAuthor: Mark Hopkins, Erik Reeber, George Forman, Jaap Suermondt    \nSource: [UCI](https://archive.ics.uci.edu/ml/datasets/spambase)   \nPlease cite: [UCI](https://archive.ics.uci.edu/ml/citation_policy.html)\n\nSPAM E-mail Database  \nThe \"spam\" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster and individuals who had filed spam.  Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam.  These are useful when constructing a personalized spam filter.  One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.\n \nFor background on spam:  \nCranor, Lorrie F., LaMacchia, Brian A.  Spam! Communications of the ACM, 41(8):74-83, 1998.  \n\n### Attribute Information:  \nThe last column denotes whether the e-mail was considered spam (1) or not (0), i.e. unsolicited commercial e-mail. Most of the attributes indicate whether a particular word or character was frequently occurring in the e-mail. The run-length attributes (55-57) measure the length of sequences of consecutive capital letters.  \n\nFor the statistical measures of each attribute, see the end of this file. Here are the definitions of the attributes:  \n\n48 continuous real [0,100] attributes of type  \nword_freq_WORD = percentage of words in the e-mail that match WORD,  i.e. 100 * (number of times the WORD appears in the e-mail) / total number of words in e-mail.  A \"word\" in this case is any string of alphanumeric characters bounded by non-alphanumeric characters or end-of-string.\n \n6 continuous real [0,100] attributes of type char_freq_CHAR = percentage of characters in the e-mail that match CHAR, i.e. 100 * (number of CHAR occurences) / total characters in e-mail\n \n1 continuous real [1,...] attribute of type capital_run_length_average\n = average length of uninterrupted sequences of capital letters\n \n1 continuous integer [1,...] attribute of type capital_run_length_longest\n = length of longest uninterrupted sequence of capital letters\n \n1 continuous integer [1,...] attribute of type capital_run_length_total\n = sum of length of uninterrupted sequences of capital letters\n = total number of capital letters in the e-mail\n \n1 nominal {0,1} class attribute of type spam\n = denotes whether the e-mail was considered spam (1) or not (0), \n i.e. unsolicited commercial e-mail.  ",
   "description": "The resources for this dataset can be found at https://www.openml.org/d/44\n\nAuthor: Mark Hopkins, Er",
   "licenses": [
      {
         "name": "ODC-PDDL",
         "path": "http://opendatacommons.org/licenses/pddl/",
         "title": "Open Data Commons Public Domain Dedication and License"
      }
   ]
}